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(54) Title: APPARATUS FOR THE IDENTIFICATION OF FREE-LYING CELLS 



(57) Abstract 

A free-lying cell classifier. An automated microscope 
system (511) comprising a computer (540) and high speed 
processing field of view processors (568) identifies free-lying 
| cells (80, 82). An image (11) of a biological specimen is 
I obtained and the image (11) is segmented (10) to create a 
set of binary masks (15). The binary masks (15) are used 
by a feature calculator (12) to compute the features that 
i characterize objects of interest (80, 82) including free-lying 
| cells, artifacts and other biological objects. The objects (80, 
82) are classified to identify their type, their normality or 
i abnormality or their identification as an artifact. The results 
are summarized and reported (18). A stain evaluation (20) of 
the slide is performed as well as a typicality evaluation (22). 
The robustness (24) of the measurement is also quantified as 
a classification confidence value (216). The free-lying cell 
evaluation is used by an automated cytology system (500) to 
classify a biological specimen slide. 
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APPARATUS FOR THE IDENTIFICATION OF FREE -LYING CELLS 

The invention relates to an automated cytology 
system and more particularly to an automated cytology 
that identifies and classifies free-lying cells and 
5 cells having isolated nuclei on a biological specimen 
slide . 

BACKGROUND OF THE INVENTION 

One goal of a Papanicolaou smear analysis system 
is to emulate the well established human review 
10 process which follows standards suggested by The 
Bethesda System. A trained cytologist views a slide 
at low magnification to identify areas of interest, 
then switches to higher magnification where it is 
possible to distinguish normal cells from potentially 

15 abnormal ones according to changes in their structure 
and context . In much the same way as a human reviews 
Papanicolaou smears, it would be desirable for an 
automated cytology analysis system to view slides at 
low magnification to detect possible areas of 

20 interest, and at high magnification to locate possible 
abnormal cells. As a cytologist compares size, shape, 
texture, context and density of cells against 
established criteria, so it would be desirable to 
analyze cells according to pattern recognition 

25 criteria established during a training period. 

SUMMARY OF THE INVENTION 
The invention identifies and classifies free- 
lying cells and cells having isolated nuclei on a 
biological specimen: single cells. Objects that 

3 0 appear as single cells bear the most significant 
diagnostic information in a pap smear. Objects that 
appear as single cells may be classified as being 
either normal cells, abnormal cells, or artifacts. 
The invention also provides a confidence level 

35 indicative of the likelihood that an object has been 
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correctly identified and classified. The confidence 
level allows the rejection of slides having only a few 
very confident abnormal cells. The staining 

characteristics of the slide are also evaluated. The 
5 invention first acquires an image of the biological 
specimen at a predetermined magnification. Objects 
found in the image are identified and classified. 
This information is used for subsequent slide 
classification. 

!0 In one embodiment, the invention utilizes a set 

of statistical decision processes that identify 
potentially neoplastic cells in Papanicolaou- stained 
cervical/vaginal smears. The decisions in accordance 
with the invention as to whether an individual cell is 
15 normal or potentially neoplastic are used to determine 
if a slide is clearly normal or requires human review. 
The apparatus of the invention uses nuclear and 
cytoplasm detection with classification techniques to 
detect and identify free-lying cells and cells having 
20 isolated nuclei. The apparatus of the invention can 
detect squamous intraepithelial lesion (SIL) or other 
cancer cells. 

In addition to the detection and classification 
of single cells, the invention measures the specimen 
25 cell population to characterize the slide . Several 
measures of stain related features are measured for 
objects which are classified as intermediate squamous 
cells. Also, many measures are made of the confidence 
with which objects are classified at various stages in 
3 0 the single cell algorithm. All of this information is 
used in conjunction with the number of potentially 
neoplastic cells to determine a final slide score. 
The invention performs three levels of processing: 
image segmentation, feature extraction, and object 
35 classification . 
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Other objects, features and advantages of the 
present invention will become apparent to those 
skilled in the art through the description of the 
preferred embodiment , claims and drawings herein 
wherein like numerals refer to like elements. 

BRIEF DESCRIPTION OF THE DRAWINGS 
To illustrate this invention, a preferred 
embodiment will be described herein with reference to 
the accompanying drawings . 

Figures 1A, IB and 1C show the automated cytology 
screening apparatus of the invention. 

Figure 2 shows the method of the invention to 
arrive at a classification result from an image. 

Figure 3A shows the segmentation method of the 

15 invention. 

Figure 3B shows the contrast enhancement method 

of the invention. 

Figures 3C and 3D show a plot of pixels vs. 

brightness . 

Figure 3E shows the dark edge incorporated image 
method of the invention. 

Figure 3F shows the bright edge removal method of 

the invent ion . 

Figures 3G, 3H and 31 show refinement of an image 

25 by small hole removal. 

Figure 4A shows the feature extraction and object 

classification of the invention. 

Figure 4B shows an initial box filter. 
Figure 4C shows a stage 1 classifier. 
Figure 4D shows a stage 2 classifier. 
Figure 4E shows a stage 3 classifier. 
Figures 4F and 4G show an error graph. 
Figure 5 shows a stain histogram. 
Figure 6A shows robust and non-robust objects. 
35 Figure 6B shows a decision boundary. 
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Figure 6C shows a segmented object. 
Figure 7A shows a threshold graph. 
Figure 7B shows a binary decision tree. 
Figure 8 shows a stage 4 classifier. 
5 Figure 9 shows a ploidy classifier. 

DETAILED DESCRIPTION OP THE PREFERRED EMBODIMENT 
In a presently preferred embodiment of the 
invention, the system disclosed herein is used in a 
system for analyzing cervical pap smears, such as that 
10 shown and disclosed in U.S. Patent Application Serial 
No. 07/83 8,064, entitled "Method For Identifying 
Normal Biomedical Specimens", by Alan C. Nelson, et 
al., filed February 18, 1992; U.S. Patent Application 
Serial No. 08/179,812 filed January 10, 1994 which is 
15 a continuation in part of U.S. Patent Application 
Serial No. 07/838,395, entitled "Method For 
Identifying Objects Using Data Processing Techniques", 
by S. James Lee, et al . , filed February 18, 1992; U.S. 
Patent Application Serial No. 07/838,070, now U.S. 
20 Pat. No. 5,315,700, entitled "Method And Apparatus For 
Rapidly Processing Data Sequences", by Richard S. 
Johnston, et al . , filed February 18, 1992; U.S. Patent 
Application Serial No. 07/838,065, filed 02/18/92, 
entitled "Method and Apparatus for Dynamic Correction 
25 of Microscopic Image Signals" by Jon W. Hayenga, et 
al.; and U.S. Patent Application Serial No. 
08/302,355, filed September 7, 1994 entitled "Method 
and Apparatus for Rapid Capture of Focused Microscopic 
Images" to Hayenga, et al . , which is a continuation- 
in-part of Application Serial No. 07/838,063 filed on 
February 18, 1992 the disclosures of which are 
incorporated herein, in their entirety, by the 
foregoing references thereto. 

The present invention is also related to 
35 biological and cytological systems as described in the 
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following patent applications which are assigned to 
the same assignee as the present invention, filed on 
September 20, 1994 unless otherwise noted, and which 
are all hereby incorporated by reference including 
5 U.S. Patent Application Serial No. 08/309,118, to Kuan 
et al. entitled, "Field Prioritization Apparatus and 
Method," U.S. Patent Application Serial No 

08/309,061, to Wilhelm et al . , entitled "Apparatus for 
Automated Identification of Cell Groupings on a 
10 Biological Specimen," U.S. Patent Application Serial 
No. 08/309, US to Meyer et al . entitled "Apparatus for 
Automated Identification of Thick Cell Groupings on a 
Biological Specimen," U.S. Patent Application Serial 
No. 08/309,115 to Lee et al. entitled "Biological 
Analysis System Self Calibration Apparatus," U.S. 
Patent Application Serial No. 08/308,992, to Lee et 
al. entitled "Apparatus for Identification and 
Integration of Multiple Cell Patterns," U.S. Patent 
Application Serial No. 08/309,063 to Lee et al . 
entitled "A Method for Cytological System Dynamic 
Normalization," U.S. Patent Application Serial No. 
08/309,248 to Rosenlof et al . entitled "Method and 
Apparatus for Detecting a Microscope Slide Coverslip, 
U.S. Patent Application Serial No. 08/309,077 to 
25 Rosenlof et al . entitled "Apparatus for Detecting 
Bubbles in Coverslip Adhesive," U.S. Patent 
Application Serial No. 08/309,931, to Lee et al . 
entitled "Cytological Slide Scoring Apparatus," U.S. 
Patent Application Serial No. 08/309,148 to Lee et al. 
30 entitled "Method and Apparatus for Image Plane 
Modulation Pattern Recognition," U.S. Patent 

Application Serial No. 08/309,209 to Oh et al . 
entitled "A Method and Apparatus for Robust Biological 
Specimen Classification," U.S. Patent Application 
35 Serial No. 08/309,117, to Wilhelm et al. entitled 
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^ It Is to b e understood that the various processes 
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562 comprises a MOTOROLA 6 8030 CPU. The motion 

controller 504 is comprised of a tray handler 518, a 
microscope stage controller 520, a microscope tray 
controller 522, and a calibration slide 524. The 
5 motor drivers 526 position the slide under the optics. 
A bar code reader 52 8 reads a barcode located on the 
slide 524. A touch sensor 530 determines whether a 
slide is under the microscope objectives, and a door 
interlock 532 prevents operation in case the doors are 

10 open. Motion controller 534 controls the motor 
drivers 526 in response to the central processor 54 0. 
An Ethernet communication system 560 communicates to 
a workstation 542 to provide control of the system. 
A hard disk 544 is controlled by workstation 550. In 

15 one embodiment, workstation 550 may comprise a SUN 
SPARC CLASSIC (TM) workstation. A tape drive 546 is 
connected to the workstation 550 as well as a modem 
548, a monitor 552, a keyboard 554, and a mouse 
pointing device 556. A printer 558 is connected to 

20 the ethernet 560, 

During object identification and classification, 
the central computer 540, running a real time 
operating system, controls the microscope 511 and the 
processor to acquire and digitize images from the 

25 microscope 511. The flatness of the slide may be 
checked, for example, by contacting the four corners 
of the slide using a computer controlled touch sensor. 
The computer 54 0 also controls the microscope 511 
stage to position the specimen under the microscope 

30 objective, and from one to fifteen field of view (FOV) 
processors 568 which receive images under control of 
the computer 54 0. 

The computer system 54 0 accumulates results from 
the 4x process and performs bubble edge detection, 

3 5 which ensures that all areas inside bubbles are 
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excluded from processing by the invention. Imaging 
characteristics are degraded inside bubbles and tend 
to introduce false positive objects. Excluding these 
areas eliminates such false positives. 
5 The apparatus of the invention checks that cover 

slip edges are detected and that all areas outside of 
the area bounded by cover slip edges are excluded from 
image processing by the 20x process. Since the 
apparatus of the invention was not trained to 
10 recognize artifacts outside of the cover slipped area, 
excluding these areas eliminates possible false 
positive results. 

The computer system 54 0 accumulates slide level 
20x results for the slide scoring process. The 
15 computer system 540 performs image acquisition and 
ensures that 2 Ox images passed to the apparatus of the 
inventions for processing conform to image quality and 
focus specifications. This ensures that no unexpected 
imaging characteristics occur. 
20 The invention performs three major steps, all of 

which are described in greater detail below: 
Step 1 - For each 20x FOV {20x objective 

magnification field of view) , the 
algorithm segments potential cell 
25 nuclei and detects their cytoplasm 

boundaries. This step is called image 
segmentation . 

Step 2 - Next, the algorithm measures feature 

values - such as size, shape, density, 
30 and texture - for each potential cell 

nucleus detected during Step 1. This 
step is called feature extraction. 

Step 3 - The algorithm classifies each detected 
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object in an FOV using the extracted 
feature values obtained in Step 2 . 
This step is called object 
classification. Classification rules 
5 are defined and derived during 

algorithm training. 

In addition to the object classification, other 
measures are made during the classification process 
which characterize the stain of the slide, and measure 

10 the confidence of classification. 

The single cell identification and classification 
system of the invention was trained from a cell 
library of training slides. 

The apparatus of the invention uses multiple 

15 layers of processing. As image data is processed by 
the apparatus of the invention, it passes through 
various stages, with each stage applying filters and 
classifiers which provide finer and finer 
discrimination. The result is that most of the 

20 clearly normal cells and artifacts are eliminated by 
the early stages of the classifier. The objects that 
are more difficult to classify are reserved for the 
later and more powerful stages of the classifier. 

During classifier development, the computer 

25 system 540 provides the invention with an image and 
allocates space for storing the features calculated on 
each object and the results of the apparatus of the 
invention. The apparatus of the invention identifies 
the potential nuclei in the image, computes features 

3 0 for each object, creates results, and stores the 
results in the appropriate location. 

During classifier development, the apparatus of 
the invention calculates and stores over 100 features 
associate with each object to be entered into the 
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object classifier training database. Additionally, 
the apparatus of the invention stores object truth 
information provided by expert cytologists for each 
object in the training database. Developers use 
statistical feature analysis methods to select 
features of utility for classifier design. Once 
classifiers have been designed and implemented, the 
apparatus of the invention calculates the selected 
features and uses them to generate classification 
results, confidence values, and stain measures. 

Refer now to Figure 2 which shows the item 
decomposition steps of the invention. In one 
embodiment of the invention, the computer system 540 
processes a 20x magnification field of view FOV. 
Steps 10, 12, 14 and 18 are functions that apply to 
all objects in the image. Steps 20, 22, 24 and 26 are 
performed only if certain conditions are met. For 
example, stain evaluation 20 takes place only on 
objects that are classified as intermediate cells. 

The first processing step is image segmentation 
10 that identifies objects of interest, or potential 
cell nuclei, and prepares a mask 15 to identify the 
nucleus and cytoplasm boundaries of the objects. 

Features are then calculated 12 using the 
original image 11, and the mask 15. The features are 
calculated in feature calculation step 12 for each 
object as identified by image segmentation 10. 
Features are calculated only for objects that are at 
least ten pixels away from the edge of the image 11 . 
The feature values computed for objects that are 
closer to the edge of the image 11 are corrupted 
because some of the morphological features need more 
object area to be calculated accurately. 

Based on the feature calculation step 12, each 
object is classified in classification step 14 as a 
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normal cell, an abnormal cell, or an artifact. At 

various stages throughout the classification process, 

several other measurements are made dependent on the 

classification results of the objects: 
5 o The stain evaluation step 20 measures stain 
related features on any object that has been 
identified as an intermediate cell. 

° An SIL atypicality process 22 measures the 
confidence of objects that were classified as 
10 potentially abnormal. 

o A robustness process 24 refers to the 
segmentation and classification. The robustness 
process 24 measures identified objects that are 
susceptible to poor classification results 
15 because they are poorly segmented or their 

feature values lie close to a decision boundary 
in a classifier. 

o A miscellaneous measurements process 26 includes 
histograms of confidences from the classifiers, 
20 histograms of the stain density of objects 

classified as abnormal, or proximity measurements 
of multiple abnormal objects in one image. 

The results of the above processes are summarized 
in step 18. The numbers of objects classified as 

25 normal, abnormal, or artifact at each classification 
stage are counted, and the results from each of the 
other measures are totaled. These results are 
returned to the system where they are added to the 
results of the other processed images. In total, 

30 these form the results of the entire slide. 

The 20x magnification images are obtained at 
Pixel size of 0.55 x 0.55 microns. The computer 540 
stores the address of the memory where the features 
computed for the objects in the FOV will be stored. 
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The computer also stores the address of the memory 
location where the results structure resides. This 
memory will be filled with the results of the 
invention. 

5 The computer system 54 0 outputs the following set 

of data for each field of view: 

SEGMENTATION FEATURES 

Four features are reported that characterize the 
segmentation of the image. 

10 SEGMENTED OBJECT COUNT 

The number of objects that were segmented in the 
FOV. This number may be different from the 
, number classified since objects that are too 
close to the edge of the FOV are not classified. 

15 OBJECT COUNTS OF INITIAL BOX FILTER 

The number of objects rejected by each of the 
five stages of the initial box filter. 

OBJECT COUNTS OF STAGEl CLASSIFIER 

The number of objects classified as normal, 
abnormal, or artifact by Stagel's box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stagel classifier. 
{Six numbers are recorded: three for the results 
of the Stagel box classifier, and three for the 
25 results of the Stagel classifier.) 

OBJECT COUNTS OF STAGE 2 CLASSIFIER 

The number of objects classified as normal, 
abnormal, or artifact by Stage2's box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stage2 classifier. 
(Six numbers are recorded: three for the results 
of the Stage2 box classifier and three for the 
results of the Stage2 classifier.) 

OBJECT COUNTS OF STAGE 3 CLASSIFIER 

35 The number of objects classified as normal, 
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abnormal, or artifact by Stage3's box classifier, 
and the number classified as normal, abnormal, or 
artifact at the end of the Stage3 classifier. 
{Six numbers are recorded: three for the results 
5 of the Stage3 box classifier and three for the 

results of the Stage3 classifier.) 

OBJECT COUNT OF STAGE 4 CLASSIFIER 

The number of objects classified as abnormal by 
the Stage4 classifier. 

10 OBJECT COUNTS OF PLOIDY CLASSIFIER 

Two values are computed: the number of objects 
classified as abnormal by the first stage of the 
Ploidy classifier and the number of objects 
classified as highly abnormal by the second stage 
15 of the Ploidy classifier. 

OBJECT COUNTS OF STAGE 4 + PLOIDY CLASSIFIER 

Two values are computed: The number of objects 
classified as abnormal by the Stage4 classifier 
that were also classified as abnormal by the 
20 first stage of the Ploidy classifier, and the 

number of objects classified as abnormal by the 
Stage4 classifier that were also classified 
highly abnormal by the second stage of the Ploidy 
classifier. 

25 STAGE2/STAGE3/STAGE4 /PLOIDY ALARM CONFIDENCE HISTOGRAM 

Histograms for the alarm confidence of the 
Stage2, Stage3, Stage4 , and Ploidy alarms 
detected in an FOV. 

STAGE2/STAGE3 ALARM COUNT HISTOGRAM 

30 Two histograms for the alarm count histogram of 

the Stage2 and Stage3 alarms detected in an FOV. 

STAGE 2 / STAGE 3 ALARM IOD HISTOGRAM 

Histograms for the Integrated Optical Density 
(IOD) of objects classified as abnormal by Stage2 
3 5 and Stage3 in an FOV. 
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INTERMEDIATE CELL IOD-SIZE SCATT ERG RAMS 

Two IOD vs. size scattergrams of the normal 
intermediate cells detected in the FOV. 

INTERMEDIATE CELL STAIN FEATURES 

5 Six features are accumulated for each object 

classified as an intermediate cell. These 
features are all stain related and are used as 
reference values in the slide level 
classification algorithms. 

10 CONTEXTUAL STAGE 1 ALARM 

Number of Stagel alarms within a 200 pixel radius 
of a Stage2 alarm in the same FOV. 

CONTEXTUAL STAGE 2 ALARM 

Number of Stage2 alarms located within a 200 
15 pixel radius of a Stage3 alarm in the same FOV. 

ESTIMATED CELL COUNT 

An estimate of the number of squamous cells 
present in the image. 

ATYPICALITY INDEX 

20 An 8x8 array of confidences for all objects sent 

to the atypicality classifier. 

SEGMENTATION ROBUSTNESS AND CLASSIFICATION DECISIVENESS 

A set of confidence measures that an object was 
correctly segmented and classified. This 
25 information is available for Stage2 and Stage3 

alarms . 

SINGLE CELL ADDON FEATURES 

A set of eight features for each object 
classified as a Stage3 alarm. This information 
30 W in be used in conjunction with slide reference 

features to gauge the confidence of the Stage3 
alarms . 

Prior to 20x magnification processing an FOV 
selection and integration process is performed at a 4x 
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magnification scan of the slide to determine the 
likelihood that each FOV contains abnormal cells. 
Next, the computer system 540 acquires the FOVs in 
descending order: from higher likelihood of abnormal 
5 cells to lower likelihood. 

Image segmentation 10 converts gray scale image 
data into a binary image of object masks. These masks 
represent a group of pixels associated with a 
potential cell nucleus. Using these masks, processing 

10 can be concentrated on regions of interest rather than 
on individual pixels, and the features that are 
computed characterize the potential nucleus. 

The image segmentation process 10 is based on 
mathematical morphology functions and label 

15 propagation operations. It takes advantage of the 
power of nonlinear processing techniques based on set 
theoretic concepts of shape and size, which are 
directly related to the criteria used by humans to 
classify cells. In addition, constraints that are 

20 application specific are incorporated into the 
segmentation processes of the invention; these include 
object shape, size, dark and bright object boundaries, 
background density, and nuclear/cytoplasmic 
relationships. The incorporation of application- 

25 specific constraints into the image segmentation 10 
process is a unique feature of the AutoPap® 300 
System's processing strategy. 

Refer now to Figure 3A which shows the image 
segmentation process 10 of the invention in more 

30 detail. The image segmentation process is described 
in a U.S. Patent application entitled "Method for 
Identifying Objects Using Data Processing Techniques" 
by Shih-Jong James Lee. For each image 29, the image 
segmentation process 10 creates a mask which uniquely 

35 identifies the size, shape and location of every 
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10 



15 



object in an FOV. There are three steps involved in 
image segmentation 10 after the 20x image data 29 is 
received in 20x imaging step 28: contrast enhancement 
30, image thresholding 32, and object refinement 34. 

During contrast enhancement 30 the apparatus of 
the invention first enhances, or normalizes, the 
contrast between potential objects of interest and 
their backgrounds: bright areas become brighter and 
dark areas become darker. This phase of processing 
creates an enhanced image 31. During image 

thresholding 32 a threshold test identifies objects of 
interest and creates a threshold image 33. The 
threshold image 33 is applied to the enhanced image 31 
to generate three binary mask images. These binary 
mask images are further refined and combined by an 
object refinement process 34 to identify the size, 
shape, and location of objects. The contrast 
enhancement process 30 increases the contrast between 
pixels that represent the object of interest and 
2 0 pixels that represent the background. 

Refer now to Figure 3B which shows the contrast 
enhancement process 30 first normalizes the image 
background 36 by pixel averaging. The contrast 
enhanced image 31 is derived from the difference 
between the original image 29 and the normalized 
background 4 0 computed in enhanced object image 
transformation step 44 . As part of the image contrast 
enhancement process 30, each object in the field of 
view undergoes a threshold test 3 8 using threshold 
data 42 to determine whether the brightness of the 
object lies within a predetermined range. The 
contrast enhancement process stops at step 47. 

At this point, the apparatus of the invention 
begins to differentiate artifacts from cells so that 
artifacts are eliminated from further analysis. The 



25 



30 



35 
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apparatus of the invention provides a range of 
predetermined values for several characteristics, 
including but not limited to brightness, size and 
shape of nucleus, cytoplasm and background, of the 
5 objects of interest. Objects whose characteristics do 
not lie within the range of these values are assumed 
to be artifacts and excluded from further 
classification. 

The brightness of an image is provided by 
10 histogram functions shown in Figures 3C and 3D 
respectively, which determines how many pixels within 
a gray scale FOV have a certain image intensity. 
Ideally, the histogram is a curve 48 having three 
peaks, as shown in the upper histogram in Figure 3C. 
15 The three peaks correspond to three brightness levels 
usually found in the images: the background, the 
cytoplasm, and the nuclei- If the number of pixels of 
each brightness level were plotted as a histogram, the 
largest, brightest peak would be the background since 
20 this usually makes up the largest portion of the image 
29. The medium brightness peak would correspond to 
the area of cytoplasm, and the darkest and shortest 
peak would correspond to the cell nuclei. 

This ideal representation rarely occurs since 
25 overlapped cells and cytoplasm tend to distort the 
results of the histogram as shown in the lower 
histogram 50 in Figure 3D. To reduce the impact of 
overlapping cells on brightness calculations, the 
apparatus of the invention applies morphological 
30 functions, such as repeated dilations and erosions, to 
remove overlapped objects from the image before the 
histogram is calculated. 

Referring again to Figure 3A, in addition to the 
contrast enhanced image 31, a threshold image 3 3 is 
35 generated by a morphological processing sequence. A 
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threshold test 32 is then performed on the enhanced 
image using the threshold image 33 to produce a binary 
image. The threshold test compares each pixel's value 
to the threshold image pixel value . The apparatus of 
5 the invention then identifies as an object pixel any 
pixel in the enhanced image that has an intensity 
greater than the corresponding pixel of the threshold 
value . 

The threshold image is combined with two 
10 predetermined offset values to generate three 
threshold images 135, 137 and 139. The first offset 
is subtracted from each gray scale pixel value of the 
original threshold image 33 to create a low threshold 
image. The second offset value is added to each gray 
15 scale pixel value of the threshold image to create a 
high threshold image. Each of these images - medium 
threshold, which is the original threshold image, low 
threshold, and high threshold - are separately 
combined with the enhanced image to provide three 
20 binary threshold images: a low threshold binary image 
35; a medium threshold binary image 37; and a high 
threshold binary image 39. 

Refer now to Figure 3E where the three binary 
threshold images are refined, beginning with the 
25 medium threshold binary image 37. The medium 

threshold binary image 37 is refined by eliminating 
holes and detecting the dark edges 52 of the objects 
of interest in the enhanced image. Dark edges 54 are 
linked using a small morphological closing and opening 
30 sequence to fill in holes. Dark edges are detected by 
determining where there is a variation in intensity 
between a pixel and its neighboring pixels. 
Thereafter, boundaries of an edge are detected 56 and 
identified as a true dark edge mask. The medium 
35 threshold binary image 3 7 is then combined in a set 
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union 58 with the edge boundary detected image 56 to 
create a dark edge incorporated image 74 . 

As illustrated in Figure 3F, bright edges 64 of 
the original image are then excluded from the medium 
5 threshold binary image 37. The bright edges of the 
enhanced image are detected in a manner similar to 
dark edge detection. The boundary of the dark edge 
incorporated image 74 is detected and combined with 
the bright edge enhanced image 64 in a set 

10 intersection operation 68. The results are subtracted 
70 from the dark edge incorporated image 74 to create 
a bright edge excluded image 72. The medium threshold 
binary image 3 7 is now represented by the bright edge 
excluded image 72. 

15 Refer to Figures 3G, 3H and 31 which show that 

Objects 80 from the bright edge excluded image 72 are 
completed by filling any holes 82 that remain. Holes 
82 can be filled without the side effect of connecting 
nearby objects. Small holes 82 are detected and then 

20 added to the original objects 80. To further refine 
the medium threshold binary image 37, the bright edge 
excluded image 72 is inverted (black becomes white and 
vice versa) . Objects that are larger than a 
predetermined size are identified and excluded from 

25 the image by a connected component analysis operation. 

The remaining image is then added to the original 
image, which provides the completed medium threshold 
binary mask that fills the holes 82. 

To complete the medium threshold binary image 37, 

30 connected objects that may not have been separated 
using the bright edge detection process of Figure 3F 
are separated. To do so, objects in the medium 
threshold binary mask 37 are eroded by a predetermined 
amount and then dilated by a second predetermined 

35 amount. The amount of erosion exceeds the amount of 
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dilation so that objects after dilation are smaller 
than before erosion. This separates connected 
objects . 

A morphological closing residue operation is 
5 applied to determine separation boundaries. A 
separation boundary is subtracted from the hole- filled 
image to create an overlap object separated binary 
image. To ensure that no objects have been lost in 
this process, the overlap object separated image is 
10 dilated to generate an object mask. Small objects not 
included in the object mask are combined in a set 
union with the object separation image to provide an 
object recovered image. 

Referring again to Figure 3A, in the last step, 
15 the high and low threshold binary images are combined 
with the object recovered image (the refined medium 
threshold binary image) to create final object masks 
41, 43 and 45. All objects identified in the high 
threshold binary image 39 are added to the refined 
20 medium threshold binary image 37 using a set union 
operation. The resulting mask is eroded by a small 
amount and dilated by a large amount, so that all 
objects are connected to a single object. This mask 
is combined with the low threshold binary mask 35. 
25 Objects in the low threshold binary mask 35 that are 
not in close proximity to objects in the medium 
threshold binary mask 37 are added to the image. 
These objects are added to the refined medium 
threshold image 43 to create the finished mask. A 
3 0 connected components labeling procedure removes small 
or oddly shaped objects and assigns a unique label to 
each remaining connected object. 

The segmented image 15 is used by the feature 
extraction process 12 to derive the features for each 
35 object. The features computed are characteristic 
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measures of the object such as size, shape, density, 
and texture. These measurements are input to the 
classifiers 14 and allow the apparatus of the 
invention to discriminate among normal cells, 
5 potentially abnormal cells, and artifacts. The 
features are defined below. 

The object classification process 14 consists of 
a series of classifiers that are grouped in stages. 
Each stage takes potentially abnormal objects from the 

10 previous stage and refines the classification result 
further using sets of new features to improve the 
accuracy of classification. At any stage, objects 
that are classified as normal or artifact are not 
classified further. 

15 Now refer to Figure 4A which shows the classifier 

process of the invention. Initial Box Filter 

classifiers 90 discards obvious artifacts. The data 
then proceeds through classification stagel, stage2, 
and stage3, classifiers 92, 94, 96 and ends with the 

20 Stage4 and Ploidy classifiers 98, 100. 

The purpose of the Initial Box Filter classifier 
90 is to identify objects that are obviously not cell 
nuclei, using as few features as possible, features 
that preferably are not difficult to compute. Only 

25 the features required for classifications are computed 
at this point. This saves processing time over the 
whole slide. The initial box filter 90 comprises five 
separate classifiers designed to identify various 
types of artifacts. The classifiers operate in series 

3 0 as shown in Figure 4B 

As an object passes through the initial box 
filter, it is tested by each classifier shown in 
Figure 4B. If it is classified as an artifact, the 
object classification 14 is final and the object is 

35 not sent to the other classifiers. If it is not, the 
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object goes to the next classifier in the series. If 
an object is not classified as an artifact by any of 
5 classifiers 102, 104, 106, 108 and 110, it will go 
to the Stagel classifier 92. 
5 Input to the initial box filter 90 comprises a 

set of feature measurements for each object segmented. 
The output comprises the following: 

o The number of objects classified as artifact by 
each of the classifiers, which results in five 
10 numbers. 

o The Stagel, Stage2, and Stage3 classification 
codes for each object classified as an artifact, 
o An "active" flag that indicates whether the 
object has a final classification. If the object 
!5 is classified as an artifact, it is not active 

anymore and will not be sent to other 
classifiers. 

The initial box filter 90 uses 15 features, which 
are listed in the following table, for artifact 
20 rejection. Each classifier within the initial box 
filter 90 uses a subset of these 15 features. The 
features are grouped by their properties. 

Feature type Feature name (e) 

Condensed Feature condensed_area_percent 

25 Context Texture Feature big_blur_ave 

Contrast Feature nc_contrast_orig 

Density Features mean_orig_ 2 

normal i zed_ mean_ od_r 3 
integrated^ density_orig 
3 o nuc_br i ght_sm 

Nucleus/Cytoplasm Texture 

Contrast Feature nuc_edge_5_5_sm 
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Shape Features compactness 

density_l_2 
density_2_3 

Size Feature perimeter 

5 Texture Features sd_orig2 

nuc_blur__sd 
nuc_edge_9_mag 



The initial box filter is divided into five 
decision rules. Each decision is based on multiple 

10 features. If the feature value of the object is 
outside the range allowed by the decision rule, the 
object is classified as an artifact. The decision 
rule for each of the initial box filter classifiers is 
defined as follows: 

15 Boxl 102 

if ( 

perimeter >= 125 OR 
compactness >= 13 OR 
density_2_3 >= 7.5 OR 
2 0 density_l_2 >= 10 

) 

then 

the object is an artifact. 



Box2 104 

25 else if ( 

mean_orig2 < 20 OR 
sd_orig2 < 5 . 3 OR 
sd_orig2 > 22.3 
) 

30 then 

the object is an artifact. 



Artifact Filter for Unfocused Objects and Polies#l 106 

else if ( 

nuc_blur_sd < 1.2 8 OR 
35 big_blur_ave < (-1.166 * nuc_blur_sd + 2.89 ) CR 

big_blur_ave < ( 4.58 * condensed_area_percent 
+ 0.8 ) OR 

compactness > (-0.136 * nuc_edge_9_mag + 18.05 ) 
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OR 

nuc_edge_5_5_sm > (-1.57 * compactness + 28.59 
) 

then 

5 the object is an artifact. 

Artifact Filter for Graphite#2 108 

else if 

nc_con t r a s t _o r i g > ( -4.162 * 
normal ized_mean_od_r 3 + 615.96 ) 

10 then 

the object is an artifact- 

Artifact Filter for Cytoplasm#3 110 

else if 

integrated^ density_orig < ( 433933.2 
15 nuc_bright_sm - 335429.8 ) 

then 

the object is an artifact. 

else 

continue the classification process with the 
20 Stage 1 Box Filter. 

Up to 40% of objects that are artifacts are 
identified and eliminated from further processing 
during the initial box filter 90 processing. This 
step retains about 99% of cells, both normal and 
25 potentially abnormal, and passes them to Stagel 92 for 
further processing. 

Objects that are not classified as artifacts by 
the classifiers of the initial box filter 90 are 
passed to Stagel 92, which comprises of a box filter 
30 classifier and two binary decision tree classifiers as 
show in Figure 4C. The Stagel box filter 92 is used 
to discard objects that are obviously artifacts or 
normal cells, using new features which were not 
available to the initial box filter 90. The binary 
35 decision trees then attempt to identify the abnormal 
cells using a more complex decision process. 

The box filter 112 identifies normal cells and 
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10 



15 



20 



25 



artifacts: the classification of these objects is 
final. Objects not classified as normal or artifact 
are sent to Classifier*! 114 which classifies the 
object as either normal or abnormal. If an object is 
classified as abnormal, it is sent to Classifier#2 
116, where it is classified as either artifact or 
abnormal. Those objects classified as abnormal by 
Classified 116 are sent to Stage2 92. Any objects 
classified as artifact by any of the classifiers in 
Stagel 92 are not sent to other classifiers. 

The input to Stagel 92 comprises of a set of 
feature measurements for each object not classified as 
an artifact by the box filters 90. The output 
comprises the following: 

The numbers of objects classified as normal , 

abnormal, and artifact by the Stagel box 

classifier, 3 numbers. 

The numbers of objects which were classified as 
normal, abnormal or artifact at the end of the 
Stagel classifier 92. 

An "active" flag that indicates whether the 
object has a final classification. If the object 
has been classified as an artifact, it is not 
active anymore and is not sent to other 
classifiers . 



The features that are used by each of the Stagel 
classifiers 92 are listed in the following tables. 
They are categorized by their properties 



!S , 



30 



Stagel Box Filter 112 
Feature type 



Condensed Features 



Feature name (a) 



condensed_count 

condensed_area_percent 

condensed_compactness 
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Context Density Feature 
Context Texture Features 

Contrast Feature 
Density Feature 



mean_background 

small_blur_ave 

big_blur_sd 

sm_blur_sd 

edge_contrast_ orig 

integrated_density_od 



10 



Nucleus/Cytoplasm Relation 
Feature 

Shape Feature 

Texture Feature 



nc_score_r4 
compactness 
texture correlations 



15 



20 



25 



Stagel, Claeeifier#l 114 
Feature type 



Condensed Feature 
Context Texture Features 



Contrast Feature 
Density Feature 

Nucleus/Cytoplasm Relation 
Features 



Nucleus/Cytoplasm Texture 
Contrast Feature 



Shape Features 



Feature name(s) 



condensed_count 

big_ blur_ave 
small_edge_9_9 
b i g_e dge_5_ma g 
big_edge_9_9 
srn_blur_sd 

edge_contrast_ orig 

autothresh enh 



mod_N_C_r at io 
cell_ nc_ratio 
nc_scor e_al t_r 3 



nuc_edge_ 2_mag_big 

compact ness2 
density_0_l 
inertia 2 ratio 
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Texture Features 



Stagel, Classifier#2 116 
10 Feature type 

Context Density Feature 
Context Texture Features 

Contrast Feature 
15 Density Features 

integrated^ density_orig2 

normal ized_integrated_od 

2 0 Nucleus/Cytoplasm 
Relation Features 



Shape Feature 
Size Feature 
Texture Features 

30 

below autothresh enh2 



cooc_i ner t i a_4_0 
sd_orig 

nonuni f orm_run 
nuc__edge_2_mag 
nuc_blur_sk 
sd_enh2 

edge_density_r3 
cooc homo 1 0 



Feature name (s) 

big_bright 

big_edge_2_dir 
big_edge_9_9 

edge_contrast_orig 

mod nuc IOD sm 



mod nuc OD sm 



normalized mean od 



nc_score_r4 
cell_semi_isolated 
mod N C ratio 



nuc_edge__9_mag_sm , 
nuc_edge_9_9_big 

area_inner_edge 

perimeter 

edge_dens i ty_r 3 
nuc__bl ur_ave 

cooc_energy_4_0 
cooc_en t ropy_l_l 3 5 
nuc_edge_2_dir 
cooc_corr_l_9 0 
texture inertia3 



Nucleus/Cytoplasm Texture 
25 Contrast Features 
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The decision rules used in each classifier are 
defined as follows: 
Box Filter 112 

5 lf ( integrated_density_od <= 17275.5 AND 
sm_blur_ave <= 4.98465 AND 
edge_contrast_ orig <= -42.023 
) 

then 

!0 the object is normal 

else if ( 

condensed_count <= 3.5 and 
compactness <= 10.6828 AND 
sm_blur_ave <= 3.0453 AND 
15 integrated_density_od <= 19 925 

condensed_area_percent > 0.0884 

) 

then 

the object is an artifact 

20 else if ( 

condensed_count <= 3.5 AND 



AND 



) 

25 then 



compactness > 10.6828 AND 
condensed_compactness <= 19.5789 

) 

the object is an artifact 



else if ( . _ 

integrated_density_od <= 223 74 

big_blur_sd <= 3.92333 AND 

30 sm_blur__sd <= 1.89516 

) 

then 

the object is normal 

else if ( ^ _ _ 

35 integrated_density_od <= 22374 

big_blur_sd <= 3.92333 
sm_blur_sd > 1.89516 
nc_score_r4 <= 0.36755 
texture_correlation3 <= 0.7534 
4 0 mean_background > 226.66 

) 

then 

the object is normal 

else if ( „ _ _„ 

45 integrated_density_od <= 22374 



AND 



AND 
AND 
AND 



AND 



AND 



AND 
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big_blur_sd <= 3.92333 AND 
sm_blur_sd > 1.89516 AND 
nc_score_r4 <« 0.3 6755 AND 
texture_correlation3 > 0.7534 
5 ) 
then 

the object is normal 



else if ( 

integrated_density_od <= 10957.5 AND 
10 big_blur_sd 3.92333 AND 

sm_blur_sd > 1.89516 AND 
nc_ score_r4 > 0.3 6755 

) 

then 

15 the object is normal 



else 

the object continues the classification process 
in Stagel, Classifierl. 



Stagel, Classifier#l 114 

20 This classifier is a binary decision tree that 

uses a linear feature combination at each node to 
separate normal cells from abnormal cells. The 
features described in the previous tables make up the 
linear combination. The features are sent to each 

25 node of the tree. The importance of each feature at 
each of the nodes may be different and was determined 
during the training process. 
Stagel, Classifier#2 116 

This classifier is a binary decision tree that 

30 uses a linear feature combination at each node to 
separate artifacts from abnormal cells. The features 
that make up the tree are listed in a previous table. 

A significant proportion of the objects 
classified as abnormal by Stagel 92 are normal cells 

35 and artifacts. Stage2 94 attempts to remove these, 
leaving a purer set of abnormal cells. .Stage2 94 
comprises a box filter 118, which discards objects 
that are obviously artifacts or normal cells, and two 
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binary decision trees shown in Figure 4D. 

The objects classified as abnormal by Stagel 92 
enter Stage2 94. The box filter 118 identifies normal 
cells and artifacts; the classification of these 
5 objects is final. Objects not classified as normal or 
artifact are sent to Classifier#l 120, which 
classifies the object as either normal or abnormal. 
If an object is classified as abnormal, it is sent to 
Classifier#2 122, where it is classified as either 
10 artifact or abnormal. Those objects classified as 
abnormal by Classifier#2 122 are sent to Stage3 96. 
Any objects classified as normal or artifact by one of 
the classifiers in Stage2 94 are not sent to other 
classifiers . 

15 The input to Stage2 94 comprises of a set of 

feature measurements for each object classified as 
abnormal by Stagel. The output comprises the 
following : 

o The numbers of objects classified as normal, 
20 abnormal, and artifact by the box filter (3 

numbers) 

o The numbers of objects which were classified as 
normal, abnormal or artifact at the end of the 
Stage2 94 classifier. 
25 o An "active" flag, which indicates whether the 
object a final classification. (If it has been 
classified as artifact or normal it is not active 
anymore, and will not be sent to other 
classifiers . ) 

30 Features Required by the Stage2 94 Classifiers 

The features that are used by each of the Stage2 
94 classifiers are listed in the following tables. 
They are categorized by feature properties. 
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Stage2 94 Box Filter 
Feature type 


Feature name (s) 




Condensed Features 


condensed avg area 
condensed_compactness 


5 


Context Density Features 


mean_background 




Context Texture Features 


cm Vjlnr sd 

will J*^ J. UX- ^? \JL 

b i g_b 1 u r_a ve 
sm_blur — ave 




Contrast Feature 


nc_contrast_orig 


10 


Density Features 


integrated_density_od 
integrated_density_od2 
normalized integrated 
_od_r3 


15 


Shape Features 


shape_score 




Texture Features 


nuc_blur_sd 
texture^ inert ia4 
texture_range4 
edge_density_r3 


20 


Stage2 94, Classifier 1 






Feature type 


Feature name(s) 


25 


Context Texture Features 


sm blur ave 
bi g_e dge_2_di r 
big_edge_5_mag 
b i g_b 1 u r_ave 
big_edge_ 9_ 9 
big_edge_3_3 




Density Feature 


min_od 




Shape Feature 


sbx (secondary box test) 


30 


Size Features 


area_inner_edge 
area 

nuclear_max 
perimeter2 


35 


Texture Features 


nu c — b 1 u r_a ve 
nu c_b lur_s k 
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Stag 2 94, Classifier 2 
Feature type 



Feature name (s) 



Condensed Feature 
Context Density Features 

5 

Contrast Features 
Density Features 
10 Shape Features 

Texture Features 



15 



condens ed_count 

mean_background 
me an_ou t e r _od 

edge_contrast_orig 
nc_contrast_orig 

nuc_bright_big 
mo d_nu c_OD_b i g 

compactness2 
density_0_l 

nuc_edge_9_mag 
nu c_b 1 ur_ave 
sd_orig2 
nuc_blur_sd 
nuc_edge_2_mag 



20 



25 



30 



35 



The Stage2 94 classifier comprises of a box 
filter and two binary decision trees as shown in 
Figure 4D. The decision rules used in each classifier 
are defined as follows: 
Box Filter 118 



if ( 

then 
else 



then 



condensed_avg_area <= 9.4722 AND 

mean_background > 23 5.182 

) 

the object is normal 

condensed_avg_area > 9-4722 AND 
condensed_compactness <= 30.8997 AND 
nuc blurjsd <= 5.96505 AND 
meaH background <= 233.45 AND 
compactness > 10.4627 AND 
texture_inertia4 0.3763 

) 



the object is normal 



BNSOOCtO: <WO 9609S05A1 J_> 



WO 96/09605 



PCT/US95/11492 



- 33 - 



else if ( 

integrated_density_od <= 30253 AND 
condensed^compactness <= 22.0611 AND 
sm_blur_sd <= 6.51617 AND 
5 shape_score <= 38.8071 AND 

texture_range4 <= 72.5 AND 
integrated_density_od > 15558.5 

then 

10 the object is an artifact 

else if ( 

integrated^ density_od <= 26781.5 AND 
edge_density_r3 <= 0.29495 AND 
mean_Jbackground > 233.526 
15 ) 



then 



the object is an artifact 



else if ( 

integrated_density_od2 <= 23461 AND 
20 normalized_integrated_od_r3 <= 11176.7 AND 

big_blur_ave <= 5.06 09 AND 
nc_contrast_orig > 37.1756 AND 
sm_blur_ave <= 3.0411 
) 



25 then 
else 



the object is normal 



continue the classification process with Stage2 
94, Classif ier#l 120 



30 Stage2 Classifier#l 120 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate normal cells from abnormal cells. The 
features used in the tree are listed in a previous 
35 table. 

Stage2 Classifier#2 122 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate artifacts from abnormal cells. The features 
40 used in the tree are listed in a previous table. 

A portion of the objects classified as abnormal 
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cells by the Stage2 94 classifier are normal cells and 
artifacts; therefore, the stages 96 classifier tries 
to remove those, leaving a purer set of abnormal 
cells. A box filter discards objects that are 
obviously artifacts or normal cells. The box filter 
is followed by a binary decision tree shown in Figure 
4E. 

The objects classified as abnormal by Stage2 94 
enter stage3 96. The box filter 124 identifies normal 
cells and artifacts: the classification of these 
objects is final. Objects not classified as normal or 
artifact are sent to the classifier 128, which 
classifies the object as either normal /art if act or 
abnormal. If an object is classified as abnormal, it 
is sent to both stage4 98 and the Ploidy classifiers. 
Any objects classified as normal or artifact by one of 
the classifiers in stage3 96 are not sent to other 
classifiers . 

Input to stage3 96 comprises of a set of feature 
measurements for each object classified as abnormal by 
Stage2 94. Outputs comprise the following: 
o The numbers of objects classified as normal, 

abnormal, and artifact by the box filter, 3 

numbers . 

The number of objects classified as normal, 
abnormal or artifact at the end of the stage3 96 
classifier . 

o An "active" flag that indicates whether the 
object has a final classification. If an object 
has been classified as a normal or artifact, it 
is not active anymore and will not be sent to 
other classifiers. 
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The features that are used by each of the stage3 
96 classifiers are listed in the following tables. 
They are categorized by feature properties. 



Stage3 Box Filter 124 
Feature type 



Feature name(e) 



Condensed Feature 
Context Density Features 

Context Distance Feature 
Context Texture Features 

Density Feature 

Nucleus/Cytoplasm 
Relation Feature 

Shape Features 
Size Features 
Texture Features 



condensed_area_percent 

me an__ba ckgr ound 
me an_ou t e r_od 

cytoplasnwnax 

big_blur_sk 
b i g_bl u r_a ve 
big_edge_2_dir 
small_blur_sd 

integrated_density_od 



cell_semi_isolated 

shape^score 
density_0_l 

perimeter 
area 

nonun i f orm_gr ay 
sd_enh 
nuc_blur_ sd 
texture_range 



25 



30 



Stage3 Classifier 128 
Feature type 



Condensed Feature 
Context Density Features 

Context Texture Features 



Feature naane(s) 



condensed_compactness 

me an_ou t e r_od 
meanjDackground 
me an_ou t e r_od_r 3 

big_blur_ave 

big_edge_5_mag 

sm_edge_9_9 
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Density Feature 
Shape Feature 
Texture Features 



min_od 
sbx 

nuc_edge_2_tnag 
cooc_correlation_l_0 
cooc_inert ia_2_0 
nonunif orm_gray 



The stage3 96 classifier is composed of a box 
filter and a binary decision tree. The decision rules 
used in each classifier are as follows: 



10 



15 



Box Filter 124 

if ( 



then 



perimeter <= 54.5 AND 
mean_background < = 225.265 AND 
big_blur_sk > 1.33969 AND 
mean_background <= 214.015 
) 

the object is an artifact 



else if ( 

20 nonunif orm_gray <= 44.5557 AND 

big_blur_ave > 2.91694 AND 
area <= 333.5 AND 
sd_enh > 11.777 9 AND 
nuc_blur_sd > 3.53 022 AND 

25 cytoplasm__max <= 11.5 

) 



then 



the object is an artifact 



AND 
AND 
AND 



else if ( 

30 nonuniform_gray <= 35.9632 

mean_background <= 225.199 
integrated_density_od <= 31257.5 
texture_range <= 76.5 AND 
condensed_area_percent <= 0.10055 

35 ) 
then 

the object is an artifact 
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else if ( 

nonuniform_gray <= 44.4472 AND 
mean_background <= 226.63 AND 
integrated_density_od <= 32322.5 AND 
5 cell_semi isolated > 0.5 

) 

then 

the object is an artifact 
else if ( 

10 nonuniform_gray <= 44.4472 AND 

mean_background <= 22 6.63 AND 
integrated_density_od <= 32322.5 AND 
cell_semi_isolated <= 0.5 AND 
shape_score <= 69.4799 AND 

15 texture range > 75.5 

) 

then 

the object is an artifact 

if the object was just classified as an artifact 
20 ( 

if 

big_edge_2_dir 0.3 891 

then 

the object is abnormal 

25 else if ( 

big_edge_2_dir <= 0.683 815 AND 
cytoplasm_max <= 22.5 AND 
mean_background 223.051 AND 
sm_blur_sd <= 4.41098 AND 

30 mean outer od <= 38.6805 

) 

then 

the object is abnormal 
else if ( 

35 big_edge_2_dir <= 0.683 815 AND 

density 0 1 > 27.5 
) 

then 

the object is abnormal 

40 

else if ( 

area > 337.5 AND 
mean_ background > 223.66 
) 

4 5 then 

the object is abnormal 
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as 



a 



the object was classified as abnormal 

thSn continue the classification process with the 
stage3 96 Classifier. 

Stage3 Classifier 128 

This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate normal cells and artifacts from abnormal 
cells. The features are listed in a previous table. 

The main purpose of Stagel-Stage3 is to separate 
the populations of normal cells and artifacts from the 
abnormal cells. To accomplish this, the decision 
boundaries 136 of the classifiers were chosen to 
minimize misclassif ication for both populations 
shown, for example, in Figure 4F. 

The number of normal cells and artifacts on 
given slide are far greater than the number of 
abnormal cells, and although the misclassif ication 
rate for those objects is far lower than it is for the 
abnormal cells, the population of objects classified 
as abnormal by the end of the stage3 96 classifier 
still contain some normal cells and artifacts 

For example: assume that the misclassif ication 
rate for normal cells is 0.1%, and 10% for abnormal 
cells. If a slide contains 20 abnormal cells and 
10,000 normal/artifact objects, the number of objects 
classified as abnormal would be 0.001*10,000 or 10 
normal/artifact objects, and 20 * .9 or 18 abnormal 
objects. The noise in the number of abnormal ob D ects 
detected at the end of the stages 96 classifier makes 
it difficult to recognize abnormal slides. 

The stage4 98 classifier uses a different 
decision making process to remove the last remaining 
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normal/artifact objects from the abnormal population. 
Stage4 98 takes the population existing after stage3 
96 and identifies the clearly abnormal population with 
a minimum misclassif ication of the normal cells or 
5 artifacts. To do this, a -higher number of the 
abnormal cells are missed than was acceptable in the 
earlier stages, but the objects that are classified as 
abnormal do not have normal cells and artifacts mixed 
in. The decision boundary 138 drawn for the stage4 98 

10 classifier is shown in Figure 4G. 

Stage4 is made up of two classifiers. The first 
classifier was trained with data from stage3 96 
alarms. A linear combination of features was 
developed that best separated the normal/artifact and 

15 abnormal classes. A threshold was set as shown in 
Figure 4G that produced a class containing purely 
abnormal cells 130 and a class 134 containing a mix of 
abnormal, normal, and artifacts. 

The second classifier was trained using the data 

20 that was not classified as abnormal by the first 
classifier. A linear combination of features was 
developed that best separated the normal/artif act and 
abnormal classes. This second classifier is used to 
recover some of the abnormal cells lost by the first 

25 classifier. 

The input to stage4 98 comprises of a set of 
feature measurements for each object classified as 
abnormal by stage3 96. 

The output comprises of the classification result 

30 of any object classified as abnormal by stage4 98. 

The features that are used by each of the stage4 
98 classifiers are listed in the following table. 
There are two decision rules that make up the stage4 
98 classifier. Each uses a subset of the features 

35 listed. 
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Featur type 



Feature name (s) 



10 



Condensed Features 
Context Texture Features 

Density Features 



Nucleus/Cytoplasm Texture 
Contrast Features 

Texture Features 



condensed_compactness 

big_blur_ave 
nuc_blur_sd_sm 
big__edge_ 5_mag 

nuc_bright_big 
normalized_integrated 

_od_r3 
norma 1 i z ed_int egr a t ed_od 



nuc_edge_9_9_big 

nonuniform_gray 

texture_range4 

below autothresh_enh2 



20 



15 Decision Rules of stage4 98 

The classifier follows these steps: 

1. Create the first linear combination of feature 
values . 

2. If the value of the combination is * a threshold, 
the object is classified as abnormal, otherwise 
it is classified as normal. 

3. If the object was classified as normal, create 
the second linear combination, 

4. If the value of this second combination is 
greater than a threshold, the object is 
classified as abnormal, otherwise it is 
classified as normal . 



25 
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combinationl = nonunif orm_gray * 2 . 047321387e-02 

+ big_blur_ave * 6 . 059888005e-01 
+ nuc_edge_ 9_9_big * 
8 .407871425e-02+ big_edge_5_mag * 
5 -3 . 132035434e-01 + nuc_ blurj3d_sm 

* 7.260803580e-01 



if combinationl a 3.06, the object is abnormal, 
if combinationl < 3.06, compute combination2 : 



combination2 = 



10 



15 



condensed_compactness * 
2 . 957029501e-03 + nonunif orm_gr ay 

* 7.682010997e-03 + 
below_autothresh_enh2 * 
3 . 975555301e-01 + nuc_bright_big 

* - 9 . 175372124e- 01 + 
normal ized_integrated_od_r3 * 
4.740774966e-05 + 
normal i zed^int egrat ed_od * 
4 . 612372868e-05 + texture_range4 

* - 2.707793610e-03 



20 if combination2 >= -0.13 the object is abnormal. 

High grade SIL and cancer cells are frequently 
aneuploid, meaning that they contain multiple copies 
of sets of chromosomes. As a result, the nuclei of 
these abnormal cells stain very dark, and therefore, 

25 should be easy to recognize. The ploidy classifier 
100 uses this stain characteristic to identify 
aneuploid cells in the population of cells classified 
as abnormal by the stage3 96 classifier. The presence 
of these abnormal cells may contribute to the final 

3 0 decision as to whether the slide needs to be reviewed 
by a human or not . 

The ploidy classifier 100 is constructed along 
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the same lines as the stage4 98 classifier: it is 
trained on stage3 96 alarms. The difference is that 
this classifier is trained specifically to separate 
high grade SIL cells from all other cells; normal, 
5 other types of abnormals, or artifacts. 

The ploidy classifier 100 is made up of two 
simple classifiers. The first classifier was trained 
with data from stage3 96 alarms. A linear combination 
of features was developed that best separated the 
10 normal/artifact and abnormal classes. A threshold was 
set that produced a class containing purely abnormal 
cells and a class containing a mix of abnormal, 
normal, and artifacts. 

The second classifier was trained using the data 
15 classified as abnormal by the first classifier. A 
second linear combination was created to separate 
aneuploid cells from other types of abnormal cells. 

The input to the ploidy classifier 100 comprises 
of a set of feature measurements for each object 
20 classified as abnormal by stage3 96. 

The output comprises of the classification 
results of any object classified as abnormal by either 
classifier in the ploidy classifier 100. 

The features used by each of the ploidy 
25 classifiers 100 are listed in the following table. 

There are two decision rules that make up the ploidy 
classifier 100. Each uses a subset of the features 
listed. 

Feature type Feature name(s) 

30 Context Texture Features big_edge_ 5_mag 

big_edge_9_9 
b i g_b lu r _a ve 

Density Features normal ized_integrated_od 

nuc_bright_big 
3 5 max_od 
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Density/Texture Features 

Nucl eiis/ Cytoplasm 
Relation Features 



Texture Features 



auto_mean_di f f _or ig2 

mod_N_ C_rat io 
nc_score_r4 

nonunif orm__gray 
t ext ur e_range4 
nuc blur sk 



10 



15 



Ploidy 100 Decision Rules 

The classifier follows these steps 



1 
2 



Create a linear combination of feature values. 
If the value of the combination is >= a 
threshold, the object is classified as abnormal. 
If the object was classified as abnormal, create 
a second linear combination. 

If the value of this second combination is 
greater than a threshold, the object is 
classified as aneuploid, or highly abnormal. 



combinat ionl = 



20 



25 



30 



nonunif orm_gray * 7 . 005183026e-03 
+ auto_mean_dif f_prig2 * 
1.776645705e-02 + mod_N_C_ratio * 
2 .493939400e-01 + nuc_bright_big 
* -9.405089021e-01 + 
normalized^ integrated_od * 
2 .770500259e-06 + big_blur_ave * 
802701652e-01 + big_edge_5_mag 
-8.586113900e-02 + big_edge_9_9 
-1 . 906895824e-02 + nuc_blur_sk 
-1 . 124482527e-01 + max_od * - 
1.787280198e-03; 



if combinationl a -0.090, the object is classified as 
abnormal . 



combination = big_blur_ave * 2 . 055980563e-01 + 

texture_range4 * -1 . 174426544e-02 
+ nc_score_r4 * 9 . 785660505e-01 ; 

if combination :> 0.63, the object is classified as 
aneuploid. 
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The ploidy classifier 100 was trained on the same 
data set as the stage4 98 classifier: 861 normal cells 
or artifacts, and 1654 abnormal cells, composed of 725 
low grade SIL, and 929 high grade SIL. All objects 
5 were classified as abnormal by the stage3 96 
classifier. 

The first classifier correctly identified 31.6% 
of the abnormal object, and mistakenly classified 9.4% 
of the normal cells and artifacts as abnormal. 
10 The second classifier was trained on all objects 

which were classified as abnormal by the first 
classifier: 81 normal cells or artifacts, 124 low 
grade SIL cells, and 394 high grade SIL cells. The 
features were selected to discriminate between low 
15 grade and high grade cells, ignoring the normal cells 
and artifacts. The threshold was set using the low 
grade, high grade, normal cells and artifacts. It 
correctly classified 34.3% of the high grade SIL 
cells, and mistakenly classified 14.3% of the low 
20 grade, normal cells or artifacts as abnormal cells. 

Or, it classified 26.8% of the abnormal cells as high 
grade SIL, and 30.9% of the normal cells or artifacts 
as high grade SIL. 

The purpose of stain evaluation 20 is to evaluate 
25 the quality of stain for a slide and to aid in the 
classification of the slide. The stain evaluation 20 
for each FOV is accumulated during the 20x slide scan. 
This information is used at the end of the slide scan 
to do the following: 
30 Judge the quality of the stain. 

If the stain of a slide is too different from 
that of the slides the apparatus of the inventions 
were trained on, the performance of the classifier may 
be affected, causing objects to be misclassif ied . 
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Aid in the classification of th slide. 

The stain features derived from the intermediate 
cells may be used to normalize other slide features, 
such as the density features measured on objects 
5 classified as abnormal. This will help verify whether 
the objects classified as abnormal are true abnormal 
cells or false alarms. 

Refer again to Figures 2 and 4A, the stain 
evaluation process 20 is composed of a classifier to 

10 identify intermediate cells and a set of stain-related 
features measured for those cells. Intermediate cells 
were chosen for use in the stain evaluation 20 because 
they have high prevalence in most slides, they are 
easily recognized by the segmentation process, and 

15 their stain quality is fairly even over a slide. 

The intermediate cell classifier is run early in 
the process of the invention, before the majority of 
the normal cells have been removed from consideration 
by the classifiers. For this reason, the classifier 

20 takes all of the cells classified as normal from the 
Stagel box classifier 112 and determines whether the 
cell is an intermediate cell or not. 

The intermediate cell classifier takes all 
objects identified as normal cells from the Stagel Box 

25 classifier 112 and determines which are well 
segmented, isolated intermediate cells . The 
intermediate cells will be used to measure the quality 
of staining on the slide, so the classifier to detect 
them must recognize intermediate cells regardless of 

30 their density. The intermediate cell classifier 
contains no density features, so it is stain 
insensitive . 

The features used by the intermediate cell 
classifier are listed in the following table. 
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Feature typ 



Feature name(s) 



10 



Nucleus/Cytoplasm 
Relation Features 



Nuclear Texture Features 
Context Texture Feature 

Nuclear Size Feature 

Shape Features 



mod_N_C_r a t io 

nc_score_alt_r4 

cell_semi_isolated 

nuc_blur_ave 
big_blur_ave 

area2 

compactness 
area_inner_edge 



The intermediate cell classifier is composed of 
two classifiers. The first classifier is designed to 
find intermediate cells with a very low rate of 
misclassif ication for other cell types . It is so 
15 stringent, it only classifies a tiny percentage of the 
intermediate cells on the slide as intermediate cells. 

To expand the set of cells on which to base the 
stain measurements, a second classifier was added that 
accepts more cells such that some small number of 
20 cells other than those of intermediate type may be 
included in the set. 

The following are the decision rules for the 
first and second classifiers: 



if 

25 { mod_N_C_ratio s 0.073325 and 

nc_score_alt_r4 s 0.15115 and 

nuc_blur_ave > 4.6846 and 

big_blur_ave s 4.5655 and 

area2 > 96.5 and 
30 cell_semi_isolated > 0.5 and 

compactness s 10.2183 ) 

the object is an intermediate cell according to the 
first classifier; 
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if 

( mod_N_C_ratio s 0.073325 and 
nc_score_alt_r4 s 0.15115 and 
nuc_blur_ave > 4.6846 and 
5 bigjblur_ave s 4.5655 and 
area2 > 96.5 and 
cell_ semi_isolated s 0.5 and 
area_inner_edge s 138.5 ) 

the object is an intermediate cell according to the 
10 second classifier. 

The stain score generator 20 takes the objects 
identified as intermediate squamous cells by the 
Intermediate Cell classifier, fills in histograms 
according to cell size and integrated optical density, 
15 and records other stain related features of each cell. 

The features used by the stain score generator 21 
are listed in the following table. 



Feature type 



Feature name(s) 



Nuclear Optical 
20 Density Features 



Nuclear Size Feature 

Nucleus/Cytoplasm 
Relation Feature 



25 



Nuclear Texture 
Features 



Cytoplasm Optical 
30 Density Features 



integrated_density_ od 
mean_od 

area 



nc_contrast_orig 
edge_contrast_orig 



sd_orig2 
nuc blur ave 



mean outer od r3 



Now refer to Figure 5 which shows an example of 
a stain histogram 140. The stain histograms 140 are 
2 -dimensional , with the x-axis representing the size 
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of the cell, and the Y-axis representing the 
integrated optical density of the cell. The IOD bins 
range from 0 (light) to 7 or 9 (dark) . The stain 
histogram for the first classifier has 10 IOD bins 
5 while the second has only 8. The size bins range from 
0 (large) to 5 (small) . There are six stain bins 
containing the following size cells: 

Size Bin Size Range 

0 221+ 

10 1 191 - 220 

2 161 - 190 

3 131 - 160 

4 101 - 130 

5 0 - 100 

15 The bin ranges for the integrated optical 

densities of the cells from the first classifier are 

shown in the following table: 

Density Bin Density Range 

0 4,000 - 6,000 

20 1 6,001 - 8,000 

2 8,001 - 10,000 

3 10,001 - 12,000 

4 12,001 - 14,000 

5 14,001 - 16,000 
25 6 16,001 - 18,000 

7 18,001 - 20,000 

8 20,001 - 22,000 

9 22,001+ 
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The bin ranges for the integrated optical 
densities of the cells from the second classifier 
are shown in the following table: 

Density Bin Density Range 

0 - 4,000 

4.000 - 8,000 

8.001 - 12,000 
12,001 - 16,000 
16,001 - 20,000 
20,001 - 24,000 
24,001 - 28,000 
28, 001+ 

Each object in the image identified as an 
intermediate cell is placed in the size/density 
15 histogram according to its area and integrated 
optical density. The first histogram includes 
objects classified as intermediate cells by the 
first classifier. The second histogram includes 
objects classified as intermediate cells by either 
20 the first or second classifier. 

The second part of the stain score generator 
accumulates several stain measurements for the 
objects classified as intermediate cells by either 
of the classifiers. The features are: 
25 mean_od 
sd_orig2 

nc_contrast_orig 
me an_ou t e r_od_r 3 
nuc_blur_ave 
3 0 edge_contrast_orig 



10 



1 
2 
3 
4 
5 
6 
7 
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For each of these features, two values are 
returned to the computer system 540: 

The cumulative total of the feature values for 
all of the intermediate cells. This will be 
used to compute the mean feature value for all 
cells identified as intermediate cells over the 
whole slide. 

The cumulative total of the squared feature 
values for all of the intermediate cells. This 
will be used with the mean value to compute the 
standard deviation of the feature value for all 
cells identified as intermediate cells over the 
whole slide* 

s.d. = y/(» 2 ) - (uf 

where (u) 2 is the mean value of the feature value 
15 squared, and (m 2 > is the mean of the squared feature 
values . 

Now refer again to Figure 2, the SIL 
atypicality index 22 is composed of two measures: 
(1) an atypicality measure and (2) a probability 

20 density process (pdf) measure. The atypicality 

measure indicates the confidence that the object is 
truly abnormal. The pdf measure represents how 
similar this object is to others in the training 
data set. The combination of these two measures is 

25 used to gauge the confidence that an object 
identified as abnormal by the Stage2 94 Box 
classifier is truly abnormal. The highest weight is 
given to detected abnormal objects with high 
atypicality and pdf measures, the lowest to those 

3 0 with low atypicality and pdf measures. 

As illustrated in Figure 4A, the atypicality 
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index 22 takes all objects left after the Stage2 94 
box filter and subjects them to a classifier. 

The following is a list of the features used by 
the atypicality index classifier 22: 
5 nonunif orm_gray 

nuc_edge_2_mag 
compactness2 
condensed_compactness 
texture_correlation3 
10 nuc_bright_big 

mean_background 
i ne r t i a_2_r a t i o 
nc_score_al t_r 3 
edge_contrast_prig 
15 mod_N__C_ratio 

no r ma 1 i z e d_me a n__od_ r 3 
no rma 1 i z e d_me an_od 
sd_orig 
mod_nuc_OD 
2 0 sm_edge_9_9 

b i g_blur_a ve 
b i g_e dge_5_mag 
cooc_inert ia_4_0 
min_od 

25 big_edge_9_9 

sm__blur_sd 

b i g_e dge_2_d i r 

sm_bright 

are a_ou t e r _e dge 
30 area 

nuc_blur_ ave 

nuc_blur_sd 

perimeter 

nuc blur sd sm 
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The following feature array is composed for the 
object to be classified: 

Feature_Array [0] = nonunif orm_gray 

Feature_Array [1] = nuc_ edge_2_mag 
5 Feature_Array [2] = compactness2 

Feature^ Array [3] = condensed_compactness 

Feature_Array [4] = texture_correlation3 

Feature^Array [5] = nuc_bright_big 

Feature^ Array [6] * mean_background 
10 Feature_Array [7] = inertia_2_ratio 

Feature_Array [8] = nc_score_alt_r3 

Feature_Array [9] = edge_contrast_orig 

Feature^Array [10] = mod_N_C_ratio 

Feature_Array [11] = normal ized_mean_od_r 3 
15 Feature_Array [12] = norma 1 i zed_mean_od 

Feature_Array [13] - sd_orig 

Fea t ure_Ar ray [ 14 ] 

Feature_Array [15] 

Feature_Array [16! 
20 Feature_Array [17; 

Feature_Array [18] 

Feature^Array [19] 

Feature^ Array [20! 

Feature_Array [21! 
25 Feature_Array [22! 

Feature_Array [23 

Feature__Array [24 

Feature_Array [25 

Feature_Array [26 
3 0 Feature^ Array [27 

Feature_Array [28 

Feature_Array [2 9 

The original feature array is used to derive a 
new feature vector with 14 elements . Each element 



= mod_ nuc_OD 

= sm_edge_9_9 

= b i g_b 1 ur_ave 

= big_edge_5_mag 

= cooc_inertia_4_0 

= min_od 

= big_edge_9_9 

= sm_blur_sd 

= big_edge_2_dir 

- sm_bright 

= area_outer_edge 

= cc.area 

= nu c_b 1 ur_a ve 

= nuc_blur_sd 

= perimeter 

= nuc blur sd sm 
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corresponds to an eigenvector of a linear 
transformation as determined by discriminant 
analysis on the training data set. 

The new feature vector is passed to two 
5 classifiers which compute an atypicality index 23 
and a pdf index 25. The atypicality index 23 
indicates the confidence that the object is truly 
abnormal. The pdf index 25 represents how similar 
this object is to others in the training data set • 
10 Once the two classification results have been 

calculated, they are used to increment a 2- 
dimensional array for the two measures. The results 
returned by each of the classifiers is an integer 
number between 1 and 8, with 1 being low confidence 
15 and 8 high confidence. The array contains the 

atypicality index on the vertical axis, and the pdf 
index on the horizontal axis. 

One indication of a classifier's quality is its 
ability to provide the same classification for an 
20 object in spite of small changes in the appearance 

or feature measurements of the object. For example, 
if the object was re-segmented, and the segmentation 
mask changed so that feature values computed using 
the segmentation mask changed slightly, the 
25 classification should not change dramatically. 

An investigation into the sources of 
classification non-repeatability was a part of the 
development of the invention. As a result, it was 
concluded that there are two major causes of non- 
30 repeatable classification comprising object and 
presentation effects and decision boundary 
effects. As the object presentation changes, the 
segmentation changes, affecting all of the feature 
measurements, and therefore, the classification. 
3 5 Segmentation robustness indicates the 
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variability of the . segmentation mask created for an 
object for each of multiple images of the same 
object. An object with robust segmentation is one 
where the segmentation mask correctly matches the 
5 nucleus and does not vary from image to image in the 
case where multiple images are made of the same 
object . 

The decision boundary effects refer to objects 

that have feature values close to the decision 
10 boundaries of the classifier, so small changes in 

these features are more likely to cause changes in 

the classification result. 

Classification decisiveness refers to the 

variability in the classification result of an 
15 object as a result of it's feature values in 

relation to the decision boundaries of the 

classifier. 

The classification decisiveness measure will be 
high if the object's features are far from the 
20 decision boundary, meaning that the classification 

result will be repeatable even if the feature values 
change by small amounts. Two classifiers were 
created to rank the classification robustness of an 
object. One measures the classification robustness 
25 as affected by the segmentation robustness. The 
other measures the classification robustness as 
affected by the classification decisiveness. 

The segmentation robustness classifier 24 ranks 
how prone the object is to variable segmentation and 
30 the classification decisiveness classifier 26 ranks 
the objects in terms of its proximity to a decision 
boundary in feature space. 

Figure 6A illustrates the effect of object 
presentation on segmentation. The AutoPap® 300 
35 System uses a strobe to illuminate the FOV. As a 
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result, slight variations in image brightness occur 
as subsequent images are captured. Objects that 
have a very high contrast between the nucleus and 
cytoplasm, such as the robust object 142 shown in 
5 Figure 6A, tend to segment the same even when the 
image brightness varies. Such objects are 
considered to have robust segmentation. 

Objects that have low contrast, such as the 
first two non-robust objects 144 and 14 6, are more 

10 likely to segment differently when the image 

brightness varies; these objects are considered to 
have non-robust segmentation. Another cause of non- 
robust segmentation is the close proximity of two 
objects as is shown in the last non-robust object 

15 148. The segmentation tends to be non-robust 
because the segmentation process may group the 
objects. 

Robust segmentation and classification accuracy 
have a direct relationship. Objects with robust 

20 segmentation are more likely to have an accurate 

segmentation mask, and therefore, the classification 
will be more accurate. Objects with non-robust 
segmentation are more likely to have inaccurate 
segmentation masks, and therefore, the 

25 classification of the object is unreliable. The 

segmentation robustness measure is used to identify 
the objects with possibly unreliable classification 
results . 

Figure 6B illustrates the decision boundary 
30 effect. For objects 154 with features in proximity 
to decision boundaries 150, a small amount of 
variation in feature values could push objects to 
the other side of the decision boundary, and the 
classification result would change. As a result, 
35 these objects tend to have non-robust classification 
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results. On the other hand, objects 152 with 
features that are far away from the decision 
boundary 150 are not affected by small changes in 
feature values and are considered to have more 
5 robust classification results. 

The segmentation robustness measure is a 
classifier that ranks how prone an object is to 
variable segmentation. This section provides an 
example of variable segmentation and describes the 
10 segmentation robustness measure. 
Variable Segmentation Example: 

The invention image segmentation 10 has 11 

steps : 

1. pre-processing 

15 2 . Histogram statistics 

3. Background normalization 

4 . Enhanced image generation 

5. Thresholding image generation 

6 . Apply thresholding 

2 0 7. Dark edge incorporation 

8. Bright edge exclusion 

9. Fill holes 

10. Object separation and recovery 

11. High threshold inclusion and low value 
25 pick up 

The areas of the segmentation that are most 
sensitive to small changes in brightness or contrast 
are steps 7. 8, and 9 . Figure 6C illustrates the 
operation of these three steps, which in some cases 
30 can cause the segmentation to be non-robust. Line 
(a) shows the object 170 to be segmented, which 
comprises of two objects close together. Line (b) 
shows the correct segmentation of the object 172, 
174, 176, and 178 through the dark edge 
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incorporation, bright edge exclusion, and fill holes 
steps of the segmentation process respectively. 
Line (C) illustrates a different segmentation 
scenario for the same object 182 , 184, 186 and 188 
5 that would result in an incorrect segmentation of 
the object. 

The dark edge incorporation step (7) attempts 
to enclose the region covered by the nuclear 
boundary. The bright edge exclusion step (8) 

10 attempts to separate nuclear objects and over- 
segmented artifacts, and the fill hole step (9) 
completes the object mask. This process is 
illustrated correctly in line (B) of Figure 6C. If 
there is a gap in the dark edge boundary, as 

15 illustrated in line (C) , the resulting object mask 
188 is so different that the object will not be 
considered as a nucleus. If the object is low 
contrast or the image brightness changes, the 
segmentation may shift from the example on line (B) 

20 to that on line (C) . 

The input to the segmentation robustness 
measure comprises of a set of feature measurements 
for each object classified as abnormal by the second 
decision tree classifier of Stage2 94 . 

25 The output comprises of a number between 0.0 

and 1.0 that indicates the segmentation robustness. 
Higher values correspond to objects with more robust 
segmentation . 

The features. were analyzed to determine those 

30 most effective in discriminating between objects 

with robust and non-robust segmentation. There were 
only 800 unique objects in the training set. To 
prevent overtraining the classifier, the number of 
features that could be used to build a classifier 

35 was limited. The features chosen are listed in the 
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10 



15 



20 



25 



30 



following table: 



Feature type 



Feature name (a) 



Context Distance Feature 

Context Texture Features 

Nuclear Density Feature 
Nuclear Texture Features 



tnin_distance 

context_3a 

context_lb 

sm_bright 
sm_edge_9_9 

mean_od 

hole_percent 



This classifier is a binary decision tree that 
uses a linear feature combination at each node to 
separate objects with robust segmentation from those 
with non-robust segmentation. The features 
described in the following list make up the linear 

combination: 

Feature_Array [0] = mean_od 

Feature_Array [1] = sm_bright 

Feature_Array [2] = sm_edge_9_9 

Feature_Array [3] = context_3a 

Feature_Array [4] - hole_percent 

Feature_Array [5] = context_lb 

Feature_Array [6] - min_distance 

The features that are sent to each node of the 
tree are identical, but the importance of each 
feature at each of the nodes may be different; the 
importance of each feature was determined during the 

training process . 

The tree that specifies the decision path is 
called the Segmentation Robustness Measure 
Classifier. It defines the importance of each 
feature at each node and the output classification 
at each terminal node. 
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10 



The classification result is a number between 
0.0 and 1.0 indicating a general confidence in the 
robustness, where 1.0 corresponds to high 
confidence . 

The classifier was trained using 2373 objects 
made up of multiple images of approximately 800 
unique objects where 1344 objects were robust and 
102 9 were non-robust. 

The performance of the classifier is shown in 
the following table: 



Robust 



Non- Robust 



Robust 
Non -Robust 



1128 


216 


336 


693 



The vertical axis represents the true robustness of 
15 the object, and the horizontal axis represents the 
classification result. For example, the top row of 
the table shows the following: 



o 1128 objects with robust segmentation were 
classified correctly as robust. 
20 o 216 objects with robust segmentation were 
classified incorrectly as non-robust. 



The classifier correctly identified 77% of the 
objects as either having robust or non-robust 
segmentation . 

2 5 The confidence measure is derived from the 

classification results of the decision tree. 
Therefore, using the confidence measures should 
provide approximately the same classification 
performance as shown in the preceding table . 

30 The classification decisiveness measure 

indicates how close the value of the linear 
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combination of features for an object is to the 
decision boundary of the classifier. The 
decisiveness measure is calculated from the binary- 
decision trees used in the final classifiers of 
5 Stage2 94 and stage3 96 by adding information to the 
tree to make it a probabilistic tree. 

The probabilistic tree assigns probabilities to 
the left and right classes at each decision node of 
the binary decision tree based on the proximity of 
10 the feature linear combination value to the decision 
boundary. When the linear combination value is 
close to the decision boundary, both left and right 
classes will be assigned a similar low decisiveness 
value. When the linear combination value is away 
15 from the decision boundary, the side of the tree 
corresponding to the classification decision will 
have high decisiveness value. The combined 
probabilities from all the decision nodes are used 
to predict the repeatability of classification for 

20 the object. 

A probabilistic Fisher's decision tree (PFDT) 
is the same as a binary decision tree, with the 
addition of a probability distribution in each non- 
terminal node. An object classified by a binary 
25 decision tree would follow only one path from the 

root node to a terminal node. The object classified 
by the PFDT will have a classification result based 
on the single path, but the probability of the 
object ending in each terminal node of the tree is 
30 also computed, and the decisiveness is based on 
those probabilities. 

Figures 7A and 7B show how the decisiveness 
measure is computed. The object is classified by 
the regular binary decision trees used in Stage2 94 
35 and stage3 96. The trees have been modified as 
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follows . At each decision node, a probability is 
computed based on the distance between the object 
and the decision boundary. 

At the first decision node, these probabilities 
5 are shown as p 2 and 1 - p 2 . The feature values of 
the objects which would be entering the 
classification node are assumed to have a normal 
distribution 190. This normal distribution is 
centered over the feature value 194, and the value 

10 of p 2 is the area of the normal distribution to the 
left of the threshold 192. If the features were 
close to the decision boundary, the values of p x and 
2-Pj indicated by area 196 would be approximately 
equal. As the feature combination value drifts to 

15 the left of the decision boundary, the value of p 2 
increases. Similar probability values are computed 
for each decision node of the classification tree as 
shown in Figure 7B . The probability associated with 
each classification path, the path from the root 

20 node to the terminal node where the classification 
result is assigned, is the product of the 
probabilities at each branch of the tree. The 
probabilities associated with each terminal node is 
shown in Figure 7B . For example, the probability of 

25 the object being classified claasl in the left most 
branch is PjP 2 . The probability that the object 
belongs to one class is the sum of the probabilities 
computed for each terminal node of that class. The 
decisiveness measure is the difference between the 

3 0 probability that the object belongs to classl and 
the probability that it belongs to class2. 
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Pctei 88 P1P2 + (! - Pit 1 - Ps) 

P ciass2 = Pl{ 1 " P2) + ( J " />l)P3 

Decisiveness - lp cW - P class2 \ 

The invention computes two classification 
decisiveness measures. The first is for objects 
classified by the second decision tree classifier of 
Stage2 94. The second is for objects classified by 
5 the decision tree classifier of stage3 96. The 

classification decisiveness measure is derived as 
the object is being classified. The output 
comprises the following: 

o The classification decisiveness measure for the 
10 object at Stage2 94 and stage3 96 if the object 

progressed to the stage3 96 classifier. The 
decisive measures range from 0.0 to 1.0. 
o The product of the classification confidence 

and the classification decisiveness measure for 
15 the object at Stage2 94 and stage3 96. 

The features used for the classification 
decisiveness measure are the same as those used for 
the second decision tree of Stage2 94 and decision 
tree of stage3 96 because the classification 
2 0 decisiveness measure is produced by the decision 
trees. 

The decision rules for the classification 
decisiveness measure are the same as those used for 
the second decision tree of Stage2 94 and decision 
25 tree of stage3 96 because the classification 

decisiveness measure is produced by the decision 
trees . 

Refer again to Figure 2, miscellaneous 
measurements process 26 describes features which are 
30 computed during classification stages of the 
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invention. They are described here because they can 
be grouped together and more easily explained than 
they would be in the individual classification stage 
descriptions. The following features are described 
5 in this part of the disclosure: 

Stage2 Confidence Histogram 

Stage3 Confidence Histogram 

Stage4 Confidence Histogram 

Ploidy Confidence Histogram 
10 Stage2 94 IOD histogram 

Stage3 IOD histogram 

Contextual Stagel Alarms 

Contextual Stage2 94 Alarms 

Addon Feature Information 
15 Estimated Cell Count 

Confidence Histograms 

When objects on a slide are classified as 
alarms, knowing with what confidence the 
classifications occurred may help to determine 
20 whether the slide really is abnormal or not. 

Therefore, the following alarm confidence histograms 
are computed: 

o Stage2 94 
o Stage3 96 
25 o Stage4 98 

Stage2 94 

The classifier for Stage2 94 , classifier 2 is a 
binary decision tree. The measure of confidence for 
each terminal node is the purity of the class at 
3 0 that node based on the training data used to 

construct the tree. For example, if a terminal node 
was determined to have 100 abnormal objects and 50 
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normal objects, any object ending in that terminal 
node would be classified as an abnormal object, and 
the confidence would be (100 + 1) / (150 + 2 ) or 
0.664. 

5 The 10 bin histogram for Stage2 94 confidences 

is filled according to the following confidence 
ranges . 
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Confidence Bin Confidence Rang 

0 0.000 - 0.490 

1 0.500 - 0.690 

2 0.700 - 0.790 
5 3 0.800 - 0.849 

4 0.850 - 0.874 

5 0.875 - 0.899 

6 0.900 - 0.924 

7 0.925 - 0.949 
10 8 0.950 - 0.974 

9 0.975 - 1.000 

Stage3 

The confidence of the stage3 96 classifier is 
determined in the same manner as the Stage2 94 
15 classifier. The confidence histogram bin ranges are 
also the same as for the Stage2 94 classifier. 
Stage4 

Figure 8 illustrates how the confidence is 
computed for the stage4 98 classifier. The 

20 classification process is described in the object. 

classification 14 Stage4 98 section. If the object 
is classified as abnormal at steps 204/203 by the 
first classifier that uses the feature combination 1 
step 2 02, the probability is computed in step 210 as 

25 described below. The object will not go to the 

second classifier, so the probability for the second 
classifier is set to 1.0 in step 212, and the final 
confidence is computed in step 216 as the product of 
the first and second probabilities. If the object 

30 was classified as normal at step 204 and step 201 by 
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the first classifier, the probability is computed, 
and the object goes to the second classifier that 
uses the feature combination 2 step 206. If the 
object is classified as abnormal by the second 
5 classifier at step 208 and step 205, the probability 
is computed in step 214 for that classifier, and the 
final confidence is computed as the product of the 
first and second probabilities in step 216. If the 
object is classified as normal by the second 
10 classifier, no confidence is reported for the 
object . 

To determine the confidence of the 
classification results in stage4 98, the mean and 
standard deviations of the linear combinations of 
15 the normal/artifact and abnormal populations were 
calculated from the training data. These 
calculations were done for the feature combination 1 
step 202 and feature combination 2 step 206. The 
results are shown in the following table: 





Feature 
Combination 1 


Feature 
Combination 2 


Normal/Artifact mean 


2.55 


- 0.258 


Normal/Artifact sd 


0.348 


0.084 


Abnormal mean 


2.80 


-0.207 


Abnormal sd 


0.403 


0.095 



Using the means and standard deviations 
25 calculated, the normal and abnormal likelihoods are 
computed for feature combination 1: 
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normaljikelihood = ( ob J ict - v ^ - norm_pop_mean) 2 

Norm _pop_sd 



abnormaljikelihood = ( ob J ect - v <*to e - abnorm_pop_mean) 2 

abnorm _pop_sd 

Compute the likelihood ratio as: 

likelihood_ratio = 

norm _pop_ 

abnorm_pop_ S d ^OSifibnormJikelihood - normjikelihood)]) 

Normalize the ratio: 

probl = ttkelihood_ratio 
1 + likelihood_ratio 

If the object is classified as normal by the 
first classifier and as abnormal by the second 
classifier, compute the normalized likelihood ratio 
as described previously using the means and standard 
deviations from the second feature combination. 
This value will be prob2 . The confidence value of 
an object classified as abnormal by the stage4 98 
classifier is the product of probl and prob2, and 
should range from 0.0 to 1.0 in value. The 
confidence value is recorded in a histogram. 

The confidence histogram has 12 bins. Bin[0] 
and Bin [11] are reserved for special cases. If the 
15 values computed for combination 1 or combination 2 
fall near the boundaries of the values existing in 
the training set, then a confident classification 
decision cannot be made about the object. If the 
feature combination value of the object is at the 
high end of the boundary, increment bin [11] by l . if 
the feature combination value is at the low end, 
increment bin[0] by 1. The decision rules for these 



10 



20 
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cases are stated as follows: 

if ( combinationl > 4.3 || combination > 0.08 ) 
stage4 98_prob_hist UU is incremented. 

if ( combinationl < 1.6 || combination < -0.55 ) 
stage4 98_prob_hist [0] is incremented. 

If the feature combination values are withii 
the acceptable ranges, the objects confidence is 
recorded in a histogram with the following bin 
ranges : 



10 



15 



20 



Confidence 


Bin Confidence Range 


1 


0.000 - 


< 0.500 


2 


0.500 - 


< 0.600 


3 


0.6 00 - 


< 0.700 


4 


0.700 - 


< 0.750 


5 


0.750 - 


- < 0.800 


6 


0.800 - 


- < 0.850 


7 


0.850 ■ 


- < 0.900 


8 


0.900 


- < 0.950 


9 


0 . 950 


- < 0.975 


10 


0.975 


- 1.000 



Figure 9 illustrates how the confidence is 
computed for the ploidy classifier 100. The 
classification process is described in the object 
classification 14 Ploidy 100 section of this 
document. The object is classified at step 222. If 
the object is classified as abnormal, "yes" 221, by 
the first classifier that uses the feature 
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combination 1 step 220, the probability is computed 
in step 224 described below and prob2 is set to 1.0 
at step 226. The object is then sent to the second 
classifier. At step 230, if the object was 
5 classified as abnormal, "yes" 231, by the second 

classifier that uses the feature combination 2 step 
228, the probability is computed for that classifier 
at step 232, and the final confidence is computed as 
the product of the first and second probabilities in 

10 step 234. If the object is classified as normal by 
either the first or the second classifier, no 
confidence is reported for the object. 

To determine the confidence of the 
classification results in the ploidy classifier 100, 

15 the mean and standard deviations of the linear 

combinations of the normal and abnormal populations 
were calculated from the training data. These 
calculations were done for the feature combination 1 
step 220 and the feature combination 2 step 228. 

20 The results are shown in the following table: 





Tha feature 

combination 1 
step 220 


The feature 
combination 2 
step 228 


Normal/Artifact mean 


2.55 


- 0.258 


Normal /Arti fact sd 


0.346 


0.084 


Abnormal ma an 


2 .80 


-0.207 


Abnormal sd 


0.403 


0. 095 



25 Using the means and standard deviations 

calculated, the normal and abnormal likelihoods are 
computed for the feature combination 1 step 220: 
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„ (object _value - norm_pop_meanf 
normal Jikelihood = No rm_pop_sd~ 

(object value - abnorm_popjmeanf 
abnormaljlkehhood = a bnorm_pop_sd 

Compute the likelihood ratio as: 

likelihoodjratio = 

norm_pop_ ( cx tfp,5( a bnormJikelihood - normjikelihood)^ 
abnorm_popjsd 

Normalize the ratio: 

likelihoodjratio 
Pr ° ~ \+likelihood_ratio 

If it goes to Step2, compute the normalized 
likelihood ratio as described above using the means 

5 and standard deviations from the second feature 
combination. This value will be prob2 . The 
confidence value of an object classified as abnormal 
by the ploidy classifier 100 is the product of probl 
and prob2 f and should range from 0.0 to 1.0 in 

10 value. The confidence value is recorded in a 
histogram. 

The confidence histogram has 12 bins. Bin[0] 
and Bin [11] are reserved for special cases. If the 
values computed for combination 1 or combination 2 
15 fall near the boundaries of the values existing in 
the training set, then a confident classification 
decision cannot be made about the object. If the 
feature combination value of the object is at the 
high end of the boundary, increment bin [11] by 1. 
If the feature combination value is at the low end, 
increment bin[0] by 1. The decision rules for these 
cases are stated as follows. 



20 
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if ( combinationl < -0.60 || combination2 < -0.30 ) 
sil^jploidy^ prob_hist [0] is incremented. 

if ( combinationl > 0.35 || combination2 > 1.60 ) 
sil_jploidy_ prob_hist [11] is incremented. 

5 If the feature combination values are within 

the acceptable ranges, the objects confidence is 
recorded in a histogram with the following bin 
ranges : 

Confidence Bin Confidence Range 



1 


0.000 - 


■ < 


0 . 500 


2 


0.500 - 


■ < 


0 . 600 


3 


0.600 - 


- < 


0 . 700 


4 


0.700 - 


■ < 


0 .750 


5 


0.750 - 


■ < 


0.800 


6 


0.800 - 


- < 


0.850 


7 


0.850 - 


- < 


0 . 900 


8 


0.900 ■ 


- < 


0 . 950 


9 


0.950 • 


- < 


0. 975 


10 


0.975 ■ 


- 1 


. 000 



2 0 XOD Histograms 

When objects are classified as alarms, it is 
useful to know their density. Abnormal cells often 
have an excess of nuclear materials, causing them to 
stain more darkly. Comparing the staining of the 
25 alarms to the staining of the intermediate cells may 
help determine the accuracy of the alarms. 
Stage2 94 

Each object classified as an abnormal cell by 
the Stage2 94 classifier is counted in the alarm IOD 
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histogram. The ranges of the bins are shown in the 
following table: 

IOD Bin Range of Integrated Optical 

Densities per Bin 



5 



10 



15 



20 



0 


0 - 11,999 


1 


t ^ nnn 


- 15 000 


2 


Aft , UUU 


- IS 999 


3 


i c nnn 


- 1*7 QQQ 


4 


lb , UUU 


1 q qqq 


5 


on nnn 




6 


22, 000 


- 23,999 


7 


24 , 000 


- 25,999 


8 


26,000 


- 27,999 


9 


28,000 


- 29,999 


10 


30,000 


- 31,999 


11 


32, 000 


- 33,999 


12 


34, 000 


- 35,999 


13 


36, 000 


- 37,999 


14 


38, 000 


- 39,999 


15 


40 


, 000 + 



Stage3 

The stage3 96 alarm IOD histogram is the same 
format as the Stage2 94 histogram. It represents 
the IOD of each object classified as an abnormal 
25 object by the stage3 96 classifier. 
Contextual Alarm Measurements 

Abnormal objects tend to form clusters, so it 
is useful to measure how many alarmed objects are 
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close to other alarmed objects. Specifically, the 
following contextual measurements are made: 

o Contextual Stage2 94 alarm: the number of 

Stagel 94 alarms that are close to a Stage2 94 
5 alarm 

o Contextual Stage3 96 alarm: the number of 

Stage2 94 alarms that are close to a stage3 96 
alarm 

The distance between alarm objects- is the Euclidean 
10 distance: 

yjAx 2 + Ay 2 

If a stage3 96 alarm is contained in an image, the 
distance between it and any Stage2 94 alarms is 
measured. If any are within a distance of 200 , they 
are considered close and are counted in the cluster2 

15 feature. This features value is the number of 

Stage2 94 alarms found close to stage3 96 alarms. 
The same applies to Stagel alarms found close to 
Stage2 94 alarms for the clusterl feature. 

Each object that is close to a higher alarm 

20 object is counted only once. For example, if a 

Stage2 94 alarm is close to two stage3 96 alarms, 
the value of clusterl will be only 1. 
Estimated Cell Count 

The results of the Stagel classification are 

25 used to estimate the number of squamous cells on the 
slide . 

If we define the following variables, 
norm = sil_stagel_normal_countl 
abn = sil_stagel_ abnormal_countl 
30 art = sil_stagel_artif act countl 
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the estimated cell count is then computed according 
to this formula: 

Est_CC = 0.91 + 1-44 ( norm ) + 0.75 ( abn ) + 0.26 
( art ) 

5 - 0.0021 ( norm 2 ) + 0.083 ( abn 2 ) - 0.0013 

( art 2 ) 

0.015 ( norm 2 ) - 0.043 ( norm * abn ) - 
0.016 ( art * abn) + 0.0016 ( norm * art * 
abn ) 

10 Process performance has been tracked and 

validated throughout all stages of classification 
training. A cross validation method was adapted for 
performance tracking at each stage, in which 
training data is randomly divided into five equal 

15 sets. A classifier is then trained by four of the 

five sets and tested on the remaining set. Sets are 
rotated and the process is repeated until every 
combination of four sets has been used for testing: 

Training data Test eet 

20 sets 1, 2, 3 & 4 

sets 2, 3, 4 & 5 
sets 3, 4, 5 & 1 
sets 4, 5, 1 Sc 2 
sets 5, 1, 2, & 3 

25 The classification merit (CM) gain is used to 

measure the performance of the apparatus of the 
inventions at each stage. 

where Sensitivity is the percentage of abnormal 
cells correctly classified as abnormal, FPR is the 



5 
1 
2 
3 
4 
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CM = 



Sensitivity 
FPR 



10 



15 



20 



false positive rate, or the percentage of normal 
cells and artifacts incorrectly classified as 
abnormal cells. 

The objects that were classified as abnormal in 
the previous stage continue to a further stage of 
classification. This stage will refine the 
classification produced by the previous stage, 
eliminating objects that were incorrectly classified 
as abnormal. This increases the CM gain. The goal 
for the apparatus of the invention is CM gain=200. 
CM Calculation Example: 

A typical normal slide might contain 1,000 
significant objects that are normal cells. The goal 
for the artifact retention rate is 0.2% 

A low prevalence abnormal slide might contain 
the same number of normal cells, along with ten 
significant single abnormal cells. Of the abnormal 
slide's ten significant abnormal objects, it is 
expected that the 4x process can select five objects 
for processing by the invention. Object 
classification 14 that has a 40% abnormal cell 
sensitivity reduces this number to 2. (5x40% = 2). 



For process performance, the CM gain is 
expected to fall within the range of 200 ± 10, and 
sensitivity is expected to be within the bounds of 
40 ± 10. Results of cross validated testing for 
each stage are illustrated in Table 5.1, which shows 
overall CM gain of 192.63 and overall sensitivity of 



CM = 



40% 



= 200 



0.20 
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32.4%, each of which fall within the range of our 
goal. 

The invention Feature Descriptions 

This section contains names and descriptions of 
5 all features that can be used for object 

classification 14. Not all features are used by the 
object classification 14 process. Those features 
that are used by the invention are listed in feature 
sets . 

10 The feature names are taken from the 

7VentyXFeatures_s structure in the AutoPap® 300 

software implementation . 

Items shown in bold face are general 

descriptions that explain a set of features. Many 
15 features are variations of similar measures, so an 

explanation block may precede a section of similar 

features . 
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Type Feature Description 

i^t label_cc: A unique numeric label assigned 

to each segmented object. The object in the upper- 
left corner is assigned a value of 1. The remaining 
5 object are labeled 2, 3, etc. from left to right 
and top to bottom. 

int xO: Upper left x coord, of the corner of 

the box which contains the object region of 
interest . 

10 int yO: Upper left y coord, of the corner of 

the box which contains the object region of 
interest . 

xl: Lower right x coord, of the corner of 
the box which contains the object region of 
15 interest . 

int yl: Lower right y coord, of the corner of 

the box which contains the object region of 
interest . 

float area: Number of pixels contained in the 

2 0 labeled region. 

float sch: A measure of shape defined as: x = xl 

-xO+ly = yl - yO+lsch = 100 * abs (x-y) / (x + y) 

float sbx: A measure of shape defined as: x - xl 

- x0 + l y = yl - y0 + l sJbx = 10 * x * y / area 

25 int stag l_label : The classification label 

assigned to the object by the stagel classifier. 
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int stage2 94_label: The classification label 

assigned to the object by the stage2 94 classifier, 

int stage3 96_label: The classification label 

assigned to the object by the stage3 96 classifier. 

5 float area2: Same feature as area except the 

area of interest (labeled region) is first eroded by 
a 3x3 element (1-pixel) . 

float area_inner_edge: Number of pixels in the 

erosion residue using a 5x5 element on the labeled 
10 image (2 -pixel inner band) . 

float area_outer_edge : Number of pixels in the 

5x5 dilation residue minus a 5x5 closing of the 
labeled image (approx. 2-pixel outer band). 

float auto_mean_di£ £_orig2 : autothresh_orig2 - 

15 mean_orig2. 

float auto_mean_dif £_ enh2 : autothresh_enh2 - 

mean_enh2 . 

float autothresh_enh: These features are 

computed in the same way as autothresh_orig except 
20 the enhanced image is used instead of the original 
image . 

float autothro8h_enh2 : These features are 

computed in the same way as autothresh_orig2 except 
the enhanced image is used instead of the original 
25 image. 

float autothr eh_orig: This computation is based 



BNSDOCID: <WO 960960SA1 I > 



WO 96/09605 



PCTAJS95711492 



- 79 - 

on the assumption that original image gray scale 
values within the nuclear mask are bimodally 
distributed. This feature is the threshold that 
maximizes the value of "variance-b" given in 
5 equation 18 in the paper by N. Otsu titled "A 
threshold selection method from gray-level 
histograms" , IEEE trans. on systems, man. and 
cybernetics, vol. smc-9, no. 1 January, 1979. 

float autothresh_orig2 : The same measurement 

10 except gray scale values are considered within a 
nuclear mask that has first been eroded by a 3x3 
element (1 -pixel) ) . 

float below_autothresh_enh2 : (count of pixels < 

autothresh_enh2) / area2 

15 float below_autothresh_orig2 : (count of pixels < 

autothresh_orig2) / area2 

float compactness: perimeter * perimeter / area 

float compactness2 : perime ter2 * perimeter2 / 

area 

20 float compactness_alt : pejrimeter2 / nuclear_max 
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Type F atur D ecription 

Condensed 

For the condensed features, condensed pixels are 
those whose optical density value is: 

> ftCondensedThreshold *mean_od. 
f tCondensedThreshold is a global floating point 
variable that can be modified (default is 1.2) . 
float condensed_percent: Sum of the condensed 

pixels divided by the total object area. 

float condensed_area_percent : The number of 

condensed pixels divided by the total object area. 

float condensed_ratio : Average optical density 

values of the condensed pixels divided by the 
jnean_od. 

15 float condensed_count : The number of components 

generated from a 4 -point connected components 
routine on the condensed pixels. 

float condeneed_avg_area: The average area 

(pixel count) of all the of condensed components. 

20 float condensed_coinpactness: The total number of 

condensed component boundary pixels squared, divided 
by the total area of all the condensed components. 

float condensed_diatance: The sum of the squared 

euclidean distance of each condensed pixel to the 
25 center of mass, divided by the area. 

float cytoplasm_max: The greatest distance 

transform value of the cytoplasm image within each 
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area of interest. This value is found by doing an 
8 -connect distance transform of the cytoplasm image, 
and then finding the largest value within the 
nuclear mask. 

float cytoplafnn_xnax_alt : The greatest distance 

transform value of the cytoplasm image within each 
area of interest . The area of interest for 
cytoplasm_max is the labeled image while the area of 
interest of cytoplasm_max_alt is the labeled regions 
generated from doing a skiz of the labeled image. 

float density_0_l: perimeter_out - perimeter 

float dens±ty_l_2: Difference between the '1' 

bin and '2' bin of the histogram described in 
perimeter. 

15 float density_2_3: Difference between the '2' 

bin and '3' bin of the histogram described in 
perimeter 

float density_3_4: Difference between the '3' 

bin and '4' bin of the histogram described in 
2 0 perimeter. 

float edge_contrast_orig: First a gray scale 

dilation is calculated on the original image using a 
5x5 structure element. The gray- scale residue is 
then computed by subtracting the original image from 
25 the dilation . edge_contrast_orig is the mean of the 
residue in a 2 -pixel outer ring minus the mean of 
the residue in a 2 -pixel inner ring (the ring refers 
to the area of interest see area_outer_edge) . 



WO 96/09605 



PCTAJS95/11492 



- 82 - 



float int grated_denBity_enh: Summation of all 

gray- scale valued pixels within an area of interest 
(values taken from enhanced image) .Value is summed 
from the conditional histogram of image. 

5 float integrated_density_enh2: The same 

measurement as the last one except the area of 
interest is first eroded by a 3x3 element (1- 
pixel) ) . 

float integrated_density_od: Summation of all 

10 gray- scaled valued pixels within an area of interest 
(values taken from the od image) . The od (optical 
density) image is generated in this routine using 
the feature processor to do a look-up table 
operation. The table of values used can be found in 
15 the file fov_features.c initialized in the static 
int array OdLut. 

float integrated_density_od2: The same 

measurement as the last one except the area of 
interest is first eroded by a 3x3 element (1 -pixel) . 



20 



float integrated_density_orig: Summation of all 

gray- scale valued pixels within an area of interest 
(values taken from original image) .Value is summed 
from the conditional histogram of image. 

float Integra ted_density_orig2: The same 

25 measurement as the last one except the area of 

interest is first eroded by a 3x3 element (1-pixel) 

float m an_background: Calculates the average 

gray- scale value for pixels not on the cytoplasm 
mask . 
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float mean_enh: Mean of the gray- scale valued 

pixels within an area of interest .Calculated 
simultaneously with integra ted_densi ty_enh from the 
enhanced image . 

5 float mean_enh2 : The same measurement as the 

last one except the area of interest is first eroded 
by a 3x3 element (1 -pixel) . 

float mean_od: The mean of gray- scale values in 

the od image within the nuclear mask. 

10 float aean_od2 : The same measurement as the last 

one except the area of interest is first eroded by a 
3x3 element (1 -pixel) . 

float mean_orig: Mean of gray-scale valued 

pixels within an area of interest . Calculated 
15 simultaneously with integrated_density_orig from the 
original image . 

float mean_orig2 : The same measurement as 

mean^orig except the area of interest is first 
eroded by a 3x3 element (1-pixel) . 

20 float mean_outer_od: The mean of the optical 

density image is found in an area produced by 
finding a 5x5 dilation residue minus a 5x5 closing 
of the nuclear mask (2-pixel border) . 

float normalized_integrated_od: First subtract 

25 mean^outerjod from each gray-scale value in the od 
image. This produces the "reduced values". Next 
find the sum of these reduced values in the area of 
the nuclear mask. 
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float nornxalized_integrat d_od2 : The same 

summation described with the last feature computed 
in the area of the nuclear mask eroded by a 3x3 
element (1 -pixel) . 

5 float nonnalized_mean_od: Computed with the 

reduced values formed during the calculation of 
normal ized^integr a ted__od : find the mean of the 
reduced values in the nuclear mask. 

float nonnalized_mean_od2 : Same calculation as 

10 normal ized_mean_od, except the nuclear mask is first 
eroded by a 3x3 structure element (1-pixel) . 

float nc_contrast_orig: Mean of gray-values in 

outer ring minus mean_orig2 . 

float nc^score: Nuclear-cytoplasm ratio.nc_score 

15 = nuclear _max / cytoplasm^max. 

float nc_score_alt: Nuclear- cytoplasm 

ratio.nc_score_alt = nuclear jxiax / cytoplasm jmax_al t 

float nuclearjnajc: The greatest 4 -connect 

distance transform value within each labeled region. 
20 This is calculated simultaneously with perimeter* and 
compactness using the distance transform image. 

float perimeter: A very close approximation to 

the perimeter of a labeled region. It is calculated 
by doing a 4 -connect distance transform, and then a 
25 conditional histogram. The '1' bin of each 
histogram is used as the perimeter value. 
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float perimeter^out: The "outside" perimeter of 

a labeled region. It is calculated by doing a 
dilation residue of the labeled frame using a 3x3 
(1 -pixel) element followed by a histogram. 

5 float perimeter2 : The average of perimeter and 

peri me ter^ou t . 

float region_dy_range_enh: The bounding box or 

the region of interest is divided into a 3x3 grid (9 
elements) . If either side of the bounding box is 

10 not evenly divisible by 3, then either the dimension 
of the center grid or the 2 outer grids are 
increased by one so that there are an integral 
number of pixels in each grid space. A mean is 
computed for the enhanced image in the area in 

15 common between the nuclear mask and each grid space. 
The region's dynamic range is the maximum of the 
means for each region minus the minimum of the means 
for each region. 

float ed_dif f erence : Difference of the two 

20 standard deviations . sd^difference = sd_prig - 
sd_enh . 

float sd_enh: Standard deviation of pixels in an 

area of interest. Calculated simultaneously with 
integrated_density__enh from the enhanced image. 

25 float sd_enh2 : The same measurement sd_enh 

except the area of interest is first eroded by a 3x3 
element (1-pixel)) . 

float sd_orig: Standard deviation of pixels in 

an area of interest. Calculated simultaneously with 
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integrated^density^orig from the original image. 

float sd_ orig2 : The same measurement as sdjorig 

one except the area of interest is first eroded by a 
3x3 element (1 -pixel) ) . 

5 float shape_score: Using the 3x3 gridded regions 

described in the calculation of region_dy_rajige_enh, 
the mean grayscale value of pixels in the object 
mask in each grid is found. Four quantities are 
computed from those mean values: H # V # Lr, and Rl . 
10 For H: Three values are computed as the sum of 

the means for each row. H is then the maximum row 
value - minimum row value . 

For V: Same as for H, computed on the vertical 
columns of the grid. 

15 For Lr: One value is the sum of the means for 

the diagonal running from the top left to the bottom 
right . The other two values are computed as the sum 
of the three means on either side of this diagonal. 
The value of Lr is the maximum - minimum value for 

20 the three regions. 

For Rl: Same as Lr, except that the diagonal 
runs from bottom-left to top-right. 

Shape JScore = \jv 2 +h 2 +Lr 2 +R1 2 

float perim_out_r3 : The "outside" perimeter of a 

labeled region determined by doing a 4 -connect 
25 distance transform of the labeled image. The number 
of 'l's in each mask are counted to become this 
value . 
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float nc_ecore_r3 : The average value of the 8- 

connect distance transform of the cytoplasm mask is 
found inside the 3x3 dilation residue of the nuclear 
mask. Call this value X. The feature is then: 
5 nuclear_max/ (X + nuclear_max) . 

float nc_score_alt_r3 : Using "X" as defined in 

nc_score__r3 , the feature is: area/ (3 . 14*X*X) . 

float nc_score_r4: The median value of the 8- 

connect distance transform of the cytoplasm mask is 
10 found inside the 3x3 dilation residue of the nuclear 
mask. This value is always an integer since the 
discrete probability density process always crosses 
0.5 at the integer values. Call this value Y. The 
feature is then: i2Lzclear_/nax/ (Y + nuclear_raax) . 

15 float nc_score_alt_r4: Using "Y" as defined in 

nc_score_r4 , the feature is: area/ (3 . 14*Y*Y) . 

float mean_outer_od_r3 : The mean value of the 

optical density image in a 9x9 (4 pixel) dilation 
residue minus a 9x9 closing of the nuclear mask. 
20 . The top and bottom 20% of the histogram are not used 
in the calculation. 

float normal! zed_mean_od_r 3 : As in 

normal ±zed_meanjod except that the values are 
reduced by mean_ou ter_od_r3 . 

25 float normalized^ integrated^ od_r3 : As in 

normal ized__integrated_od except that the values are 
reduced by meanjouter_od_r3. 

float edg _density_r3: A gray-scale dilation 
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residue is performed on the original image using a 
3x3 element. The feature is the number of pixels > 
10 that lie in the 5x5 erosion of the nuclear mask. 

Texture 

5 In the following texture features, two global 

variables can be modified to adjust their 
calculation. f tOccurranceDelta is an integer 
specifying the distance between the middle threshold 
(mean) and the low threshold, and the middle (mean) 

10 and the high threshold. f tOccurranceOf f set is an 
integer specifying the number of pixels to "look 
ahead" or "look down". 

To do texture analysis on adjacent pixels, this 
number must be 1 . To compute the texture features 

15 the fl S" or "co-occurrence matrix" is first defined. 

To compute this matrix, the original image is first 
thresholded into 4 sets. Currently the thresholds 
to determine these four sets are as follows, where M 
is the mean_orig: x = 1 if x<M-20, x=2 if M-20<=x<M, 

20 x=3 if M<= x <M+20, x=4 if x >=M+20. The co- 
occurrence matrix is computed by finding the number 
of transitions between values in the four sets in a 
certain direction. Since there are four sets the 
co-occurrence matrix is 4x4 . As an example consider 

25 a pixel of value 1 and its nearest neighbor to the 
right which also has the same value. For this 
pixel, the co-occurrence matrix for transitions to 
the right would therefore increment in the first 
row-column. Since pixels outside the nuclear mask 
3 0 are not analyzed transitions are not recorded for 

the pixels on the edge. Finally, after finding the 
number of transitions for each type in the co- 
occurrence matrix each entry is normalized by the 
total number of transitions. texture_correlation 
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and texture^inertia are computed for four 
directions: east, southeast, south, and southwest. 

float texture__correlation : The correlation 

process calculation is described on page 187 of 
5 Computer Vision, written by Ballard & Brown, 

Prentice-Hall, 1982. Options 2,3,4 indicate the 
same analysis, except that instead of occurring in 
the East direction it occurs in the Southeast, South 
or Southwest direction. 

10 float texture^inertia: Also described in 

Computer Vision, id. . 

float texture_range : The difference between the 

maximum and minimum gray- scale value in the original 
image . 

15 float texture_correlation2 : As above, direction 

southeast . 

float texture_inertia2 : As above, direction 

southeast . 

float texture_range2 : As above, direction 

20 southeast . 

float texture^ correlations : As above, direction 
south. 

float texture_inertia3 : As above, direction 
south. 

25 float texture_range3 : As above, direction south. 
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float texture_correlation4 : As above, direction 

southwest . 

float texture_inertia4: As above, direction 

southwest . 

5 float texture_range4 : As above, direction 

southwest . 

cooc 

In the following features utilizing the "co- 
occurrence" or "S" matrix, the matrix is derived 
10 from the optical density image. To compute this 
matrix, the optical density image is first 
thresholded into six sets evenly divided between the 
maximum and minimum OD value of the cell's nucleus 
in question. The S or "co-occurrence matrix" is 
15 computed by finding the number of transitions 
between values in the six sets in a certain 
direction. Since we have six sets, the co- 
occurrence matrix is 6x6. As an example, consider a 
pixel of value 1 and its nearest neighbor to the 
20 right, which also has the same value. For this 

pixel, the co-occurrence matrix for transitions to 
the right would increment in the first row- column. 
Since pixels outside the nuclear mask are not 
analyzed, transitions are not recorded for the 
25 pixels on the edge. Finally, after finding the 
number of transitions for each type in the co- 
occurrence matrix, each entry is normalized by the 
total number of transitions. The suffixes on these 
features indicate the position the neighbor is 
30 compared against. They are as follows: _1_0 : one 
pixel to the east. _2_0 : two pixels to the east. 
4 0: four pixels to the east. _1_ 45: one pixel 
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to the southeast. : one pixel to the south. 

_1_ 135: one pixel to the southwest. 

float cooc_energy_l_0 : The square root of the 

energy process described in Computer Vision, id. . . 
5 Refer to the COOC description above for an 
explanation of the 1_0 suffix. 

float cooc_energy_2_0 : Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

10 float cooc_energy_4_0 : Refer to the COOC 

description above for an explanation of the 4_0 
suffix . 

float cooc_energy_l_45 : Refer to the COOC 

description above for an explanation of the 1_45 
15 suffix. 

float cooc_energy_l_90 : Refer to the COOC 

description above for an explanation of the 1_90 
suffix. 

float cooc_energy_l_135 : Refer to the COOC 

20 description above for an explanation of the 1_135 
suffix. 

float cooc__entropy_l_0 : The entropy process 

defined in Computer Vision, id. . Refer to the COOC 
description above for an explanation of the 1_0 
25 suffix. 

float cooc_entropy_ 2_0 : Refer to the COOC 

description above for an explanation of the 2_0 
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suffix. 

float cooc_entropy_4_0 : Refer to the COOC 

description above for an explanation of the 4_0 
suffix. 

5 float cooc_entropy_JL_45: Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

float cooc_entropy_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
10 suffix. 

float cooc_entropy_l_135: Refer to the COOC 

description above for an explanation of the 1_135 
suffix. 

float cooc_inertia_l_0: The inertia process 

15 defined in Computer Vision, id. . 

float cooc_inertia_2_0 : Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

float cooc_inertia_4_0 : Refer to the COOC 

20 description above for an explanation of the 4_0 
suffix. 

float cooc_inertia_l_45: Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

25 float cooc_inertia_l_90: Refer to the COOC 

description above for an explanation of the 1_90 
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suffix. 

float cooc_inertia_l_135: Refer to the COOC 

description above for an explanation of the 1_13 5 
suffix. 

5 float cooc - homo_l_0 : The homogeneity process 

described in Computer Vision, id.. Refer to the 
COOC description above for an explanation of the 1_0 
suffix. 

float cooc_hoino_2_0 : Refer to the COOC 

10 description above for an explanation of the 2_0 
suffix. 

float cooc_homo_4_0 : Refer to the COOC 

description above for an explanation of the 4_0 
suffix. 

15 float cooc_hoxno_l_45 : Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

float cooc_hoino_l_90 : Refer to the COOC 

description above for an explanation of the l_ 90 
20 suffix. 

float cooc_homo_l_135: Refer to the COOC 

description above for an explanation of the 1_135 
suffix. 

float cooc_corr_l_0: The correlation process 

25 described in Computer Vision, id. . Refer to the 

COOC description above for an explanation of the 1_0 
suffix . 
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float cooc_corr_2_ 0 : Refer to the COOC 

description above for an explanation of the 2_0 
suffix. 

float cooc_corr_4_0 : Refer to the COOC 

5 description above for an explanation of the 4_0 
suffix. 

float cooc_corr_l_45: Refer to the COOC 

description above for an explanation of the 1_45 
suffix. 

10 float cooc_ corr_l_90 : Refer to the COOC 

description above for an explanation of the 1_90 
suffix. 

float cooc_corr_l_135 : Refer to the COOC 

description above for an explanation of the 1_135 
15 suffix. 

Run Length 

The next five features are computed using run 
length features. Similar to the co-occurrence 
features, the optical density image is first 

20 thresholded into six sets evenly divided between the 
maximum and minimum OD value of the cell's nucleus 
in question. The run length matrix is then computed 
from the lengths and orientations of linearly 
connected pixels of identical gray levels. For 

25 example, the upper left corner of the matrix would 
count the number of pixels of gray level 0 with no 
horizontally adjacent pixels of the same gray value. 
The entry to the right of the upper left corner 
counts the number of pixels of gray level 0 with one 

30 horizontally adjacent pixel of the same gray level. 
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float emphasis_ehort : The number of runs divided 

by the length of the run squared: 

# gray « runs 

E E 

p(i,j) is the number of runs with gray level i 
and length j. This feature emphasizes short runs, 
5 or high texture. 

float emphasi s_long: The product of the number 

of runs and the run length squared: 

# gray # runs 

E E J 2 -p(ij) 

p(i,j) is the number of runs with gray level i 
and length j . This feature emphasizes long runs, or 
10 low texture. 

float nonunifonn_gray: The square of the number 

of runs for each gray level: 



# gray 

1*1 



'# runs 



The process is at a minimum when the runs are 
equally distributed among gray levels. 

15 float nonuniforxn_run: The square of the number 

of runs for each run length: 



# runs 

E 



'# gray 



This process is at its minimum when the runs 
are equally distributed in length. 
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float percentage_run: The ratio of the total 

number of runs to the number, of pixels in the 
nuclear mask: 

# gray # runs 

E E 

# pixels 

This feature has a low value when the structure 
5 of the object is highly linear- 
float inertia_2_min_axi8 : Minimum axis of the 
2nd moment of inertia of the nuclear region 
normalized by the area in pixels. 

float inertia_2_max__axis : Maximum axis of the 

10 2nd moment of inertia of the nuclear region 
normalized by the area in pixels. 

float inertia_2_ratio: inertia^2_min_axis / 

inert ia_2_max_axis . 

float max_od: Maximum optical density value 

15 contained in the nuclear region. 

float min_od: Minimum optical density value 

contained in the nuclear region. 

float sd_od: Standard deviation of the optical 

density values in the nuclear region. 

20 float cell_f ree_lying: This feature can take on 

two values: 0.0 and 1.0 (1.0 indicates the nucleus 
is free lying) .To determine if a cell is free lying, 
a connected components is done on the cytoplasm 
image, filtering out any components smaller than 400 



BNSDOGD: < WO 9609605A 1 _l_> 



WO 96/09605 



PCTAJS95/11492 



- 97 



pixels and larger in size than the integer variable 
AlgFreeLyingCytoMax (default is 20000). If only one 
nucleus bounding box falls inside the bounding box 
of a labeled cytoplasm, the nucleus (cell) will be 
5 labeled free lying (1.0), else the nucleus will be 
labeled 0.0. 

float cell_eemi_±Bolated: This feature can take 

on two values: 0.0 and 1.0 (1.0 indicates the nucleus 
is semi-isolated) . A nucleus is determined to be 

10 semi-isolated when the center of its bounding box is 
a minimum euclidean pixel distance from all other 
nuclei (center of their bounding boxes) . The 
minimum distance that is used as a threshold is 
stored in the global floating-point variable 

15 AlgSemilsolatedDistanceMin on the FOV card (default 
is 50.0) .Only nuclei with the cc. active field non- 
zero will be used in distance comparisons; non- 
active cells will be ignored entirely. 



20 



float cell_cyto_area: If the cell has been 

determined to be free-lying (cell_free_lying= 1.0), 
this number represents the number of pixels in the 
cytoplasm (value is approximated due to earlier 
downsampling) . If the cell is not free-lying, this 
number is 0.0. 



25 float cell_nc_ratio: If the cell has been 

determined to be free-lying (cell_free_lying= 1.0), 
this number is cc.area/ cell_cyto_area.lt the cell 
is not free-lying, this number is 0.0. 

float cell_centroid_dif f : This feature is used 

30 on free-lying cells. The centroid of the cytoplasm 
is calculated, and the centroid of the nucleus. The 
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feature value is the difference between these two 
centroids . 

Local Area Context Normalization Features 

The original image nucleus is assumed to 
5 contain information not only about the nucleus, but 
also about background matter. The gray level 
recorded at each pixel of the nucleus will be a 
summation of the optical density of all matter in 
the vertical column that contains the particular 

10 nucleus pixel. In other words, if the nucleus is 

located in a cytoplasm which itself is located in a 
mucus stream, the gray level values of the nucleus 
will reflect not only the nuclear matter, but also 
the cytoplasm and mucus in which the nucleus lies. 

15 To try to measure features of the nucleus without 
influence of the surroundings and to measure the 
nucleus surroundings, two regions have been defined 
around the nucleus. Two regions have been defined 
because of a lack of information about how much area 

20 around the nucleus is enough to identify what is 
happening in proximity to the nucleus. 

The two regions are rings around each nucleus . 
The first ring expands 5 pixels out from the nucleus 
(box 7x7 and diamond 4) and is designated as the 

25 "small" ring. The second region expands 15 pixels 
out from the nucleus (box 15x15 and diamond 9) and 
is called the "big" ring. 

float sxo_bright: Average intensity of the pixels 

in the small ring as measured in the original image. 

30 float big_bright: Average intensity of the 

pixels in the big ring as measured in the original 
image . 
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float nuc_bright_sm: Average intensity of the 

nuclear pixels divided by the average intensity of 
the pixels in the big ring. 

float nuc_br±ght_big: Average intensity of the 

5 ^ nuclear pixels divided by the average intensity of 
the pixels in the small ring. 

3x3 

The original image is subtracted from a 3x3 
closed version of the original. The resultant image 
10 is the 3x3 closing residue of the original. This 
residue gives some indication as to how many dark 
objects smaller than a 3x3 area exist in the given 
region . 

float sm_edge_ 3_3 : Average intensity of the 3x3 

15 closing residue in the small ring region. 

float big_edge_3_3 : Average intensity of the 3x3 

closing residue in the big ring region. 

float nuc_edge_3_3_sm: Average intensity of the 

3x3 closing residue in the nuclear region divided by 
20 the average intensity of the 3x3 closing residue in 
the small ring. 

float nuc_edge_3_3_big: Average intensity of the 

3x3 closing residue in the nuclear region divided by 
the average intensity of the 3x3 closing residue in 
25 the big ring. 

5x5 

The residue of a 5x5 closing of the original 
image is done similarly to the 3x3 closing residue 
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except that the 3x3 closed image is subtracted from 
the 5x5 closed image instead of the original . This 
isolates those objects between 3x3 and 5x5 in size. 

float B m_edge_5_5: Average intensity of the 5x5 

5 closing residue in the small ring region. 

float big_edge_5_5: Average intensity of the 5x5 

closing residue in the big ring region. 

float nuc_edge_5_5_8m: Average intensity of the 

5x5 closing residue in the nuclear region divided by 
10 the average intensity of the 5x5 closing residue in 
the small ring. 

float nuc_edge_5_5_big: Average intensity of the 

5x5 closing residue in the nuclear region divided by 
the average intensity of the 5x5 closing residue in 
15 the big ring. 

9x9 

The residue of a 9x9 closing of the original 
image is done in the same way as the 5x5 closing 
residue described above except the 5x5 closing 
20 residue is subtracted from the 9x9 residue rather 
than the 3x3 closing residue. 

float sm_edge_9_9: Average intensity of the 9x9 

closing residue in the small ring region. 

float big_edge_9_9: Average intensity of the 9x9 

25 closing residue in the big ring region. 

float nuc_edge_9_9_sm: Average intensity of the 

9x9 closing residue in the nuclear region divided by 
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the average intensity of the 9x9 closing residue in 
the small ring. 

float mic_edge_9_9_big: Average intensity of the 

9x9 closing residue in the nuclear region divided by 
5 the average intensity of the 9x9 closing residue in 
the big ring. 

2 Mag 

To find if an angular component exists as part 
of the object texture, closing residues are done in 
10 the area of interest using horizontal and vertical 
structuring elements. The information is combined 
as a magnitude and an angular disparity measure. 
The first structuring elements used are a 2x1 and 
1x2. 

15 float nuc_edge_2_mag : Magnitude of 2x1 and 1x2 

closing residues within the nuclei. Square root of 
( (average horizontal residue) ^2 + (average vertical 
residue) A 2 ) . 

float Bxn_edge_2_iaag : Magnitude of 2x1 and 1x2 

20 closing residues within the small ring. Square root 
of ( (average horizontal residue) ~2 + (average 
vertical residue) A 2 ). 

float big_edge_2_mag: Magnitude of 2x1 and 1x2 

closing residues within the big ring. Square root 
25 of ( (average horizontal residue) A 2 + (average 
vertical residue) *2 ). 

float nuc_edg _2_mag_sm: nuc^jedge_2_mag / 

sin_ e dge__2_ma g . 
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float nuc_edg _2_mag_big: nuc_jedge_2jmag / 

bi g_edge_2__mag . 

float nuc_edge_2_dir : Directional disparity of 

2x1 and 1x2 closing residues within the nuclei. 
5 (average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float Bm_edge_2_dir : Directional disparity of 

2x1 and 1x2 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
10 residue) + (average vertical residue) ) . 

float big_edge__2_dir : Directional disparity of 

2x1 and 1x2 closing residues in the big ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

15 float nuc_ edge — 2_ dir_Bm: nuc_edge_2_dir / 

sm_edge_2_di r . 

float nuc_edge_2_dir_big: nuc_edge__2__dir / 

bi g_edge_2_di r . 

5 Mag 

2 0 The structuring elements used are a 5x1 and a 

1x5. In this case, the residue is calculated with 
the 2x1 or 1x2 closed images rather than the 
original as for the 2x1 and 1x2 structuring elements 
described previously. 

25 float nuc_edge_5_mag : Magnitude of 5x1 and 1x5 

closing residues within the nuclei. Square root of 
( (average horizontal residue) "2 + (average vertical 
residue) A 2 ) . 
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float sm_edg _5_mag: Magnitude of 5x1 and 1x5 

closing residues within the small ring. Square root 
of ( (average horizontal residue) *2 + (average 
vertical residue) *2 ). 

5 float big — edge_5_mag : Magnitude of 5x1 and 1x5 

closing residues within the big ring. Square root 
of ( (average horizontal residue) A 2 + (average 
vertical residue) A 2 ). 

float nuc_edge_5_mag_em : nuc_edge_5_/nag / 

10 sm_edge_5_mag 

float nuc_edge_5_mag_big: nuc_edgre_5_jnagr / 

hi g_edge_5_mag 

float nuc_edge_5_d±r : Directional disparity of 

5x1 and 1x5 closing residues within the nuclei. 
15 (average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

float sm_edge_5_dir: Directional disparity of 

5x1 and 1x5 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
20 residue) + (average vertical residue) ) • 

float big_edge_ 5_dir : Directional disparity of 

5x1 and 1x5 closing residues in the big ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

25 float nuc_edge_5_dir_sm: nuc_edge_5_dir / 

sm_edge__5jdir 
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float nuc_ dge_5_dir_big: nuc_edge_5_dir / 

big_edge_5_dir 

9 Mag 

The last of the angular structuring elements 
5 used are a 9x1 and 1x9. In this case, the residue 
is calculated with the 5x1 or 1x5 closed images 
rather than the 2x1 and 1x2 structuring elements 
described for the 5x1 and 1x5 elements. 

float nuc_edge_9_mag: Magnitude of 9x1 and 1x9 

10 closing residues within the nuclei. Square root of 
( (average horizontal residue) A 2 + (average vertical 
residue) ^2 ) . 

float em_edge_9_mag : Magnitude of 9x1 and 1x9 

closing residues within the small ring. Square root 
15 of ( (average horizontal residue) A 2 + (average 
vertical residue) *2 ). 

float big_edge_9_mag: Magnitude of 9x1 and 1x9 

closing residues within the big ring. Square root 
of ( (average horizontal residue) A 2 + (average 
20 vertical residue) A 2 ). 

float nuc_edge_9_mag_em: nuc_edge_9_mag / 

sm_edge_9_mag 

float nuc_edge_9_mag_big: nuc_edge_9_mag I 

hig_edge_9_mag 

25 float nuc_edge_9_dir: Directional disparity of 

9xl and 1x9 closing residues within the nuclei, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 
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float em_edge_9_dir : Directional disparity of 

9x1 and 1x9 closing residues in the small ring, 
(average vertical residue) / ( (average horizontal 
residue) + (average vertical residue) ) . 

5 float big_edge_9_dir : Directional disparity of 

9x1 and 1x9 closing residues in the big ring, 
(average vertical residue) / { (average horizontal 
residue) + (average vertical residue) ) . 

float nuc_edge_9_dir_sxn: nuc_edgre_9_dir / 

10 sm__edge^9_jdir 

float nuc_edge_9_dir_big: nuc_ edge_9_ dir / 

big_edge_9_dir 

Blur 

As another measure of texture, the original is 
15 blurred using a 5x5 binomial filter. A residue is 
created with the absolute magnitude differences 
between the original and the blurred image . 

float nuc_blur_ave : Average of blur image over 

label mask. 

20 float nuc_blur_sd: Standard deviation of blur 

image over label mask. 

float nuc_blur_ek: skewness of blur image over 

label mask. 

float nuc_blur_ku: kurtosis of blur image over 

25 label mask. 
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float sxn_blur_ave : Average of blur image over 

small ring. 

float sm — blur_sd: Standard deviation of blur 
image over small ring, 

5 float sm_blur_sk: Skewness of blur image over 

small ring. 

float sm_blur_ku: Kurtosis of blur image over 

small ring. 

float big_blur_ave : Average of blur image over 

10 big ring. 

float big_blur_sd: Standard deviation of blur 

image over big ring. 

float big_blur_sk: Skewness of blur image over 

big ring. 

15 float big_blur_ku: Kurtosis of blur image over 

big ring. 

float nuc_blur_ave_Bin: Average of blur residue 

for the nuclei divided by the small ring. 

float nuc_blur_sd_sm: Standard deviation of blur 

20 residue for the nuclei divided by the small ring. 

float nuc_blur_sk_sm : Skew of blur residue for 

the nuclei divided by the small ring. 

float nuc_blur_ave_big: Average of blur residue 

for the nuclei divided by the big ring. 



BNSDOQD: <WO_9609605A1J_> 



WO 96/09605 



PCTAJS95/1 1492 



- 107 - 

float nuc_blur_sd_big: Standard deviation of 

blur residue for the nuclei divided by the big ring. 

float nuc_blur_ek_big: Skew of blur residue for 

the nuclei divided by the big ring. 

5 float mod_N_C_ratio : A ratio between the nuclear 

area and the cytoplasm area is calculated. The 
cytoplasm for each nuclei is determined by taking 
only the cytoplasm area that falls inside of a skiz 
boundary between all nuclei objects. The area of 
10 the cytoplasm is the number of cytoplasm pixels that 
are in the skiz area corresponding to the nuclei of 
interest. The edge of the image is treated as an 
object and therefore creates a skiz boundary. 

float mod_nuc_OD: The average optical density of 

15 the nuclei is calculated using floating point 
representations for each pixel optical density 
rather than the integer values as implemented in the 
first version. The optical density values are 
scaled so that a value of 1.2 is given for pixels of 
20 5 or fewer counts and a value of 0.05 for pixel 

values of 245 or greater. The pixel values between 
5 and 245 span the range logarithmically to meet 
each boundary condition. 

float mod_nuc_IOD : The summation of the optical 

25 density values for each pixel within the nuclei. 

float mod_nuc_OD_sia: The average optical density 

of the nuclei minus the average optical density of 
the small ring. 
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float mod_nuc_OD_big : The average optical 

density of the nuclei minus the average optical 
density of the big ring. 

float mod_nuc_IOD_em: mod_nuc_OD_sm * number of 

5 pixels in the nuclei. Essentially, this is the 

integrated optical density of the nuclei normalized 
by the average optical density of the pixels within 
the small ring around the nuclei. 

float xnod_nuc_IODjbig: mod_nuc_OD_big * number 

10 of pixels in the nuclei. Same as above, except the 
average optical density in the big ring around the 
nuclei is used to normalized the data. 

0D_bin_*_* 

These features are the result of placing each 
15 pixel in the nuclear mask area in a histogram where 
each bin represents a range of optical densities. 
The numbers should be read as 1_2 = 1.2, 0_B25 = 
0.825. 

The original image is represented as 
transmission values. These values are converted 
during the binning process to show equal size bins 
in terms of optical density which is a log 
transformation of the transmission. The Histogram 
bins refer to the histogram of pixels of 
25 transmission values within the nuclear mask. 

float OD_bia_l_2: Sum Histogram bins #0 - 22 / 

Area of label mask. 

float OD_bin_l_125: Sum Histogram bins #13 / 

Area of label mask. 



20 
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float OD_bin_l_05: Sum Histogram bins #23 - 26 / 

Area of label mask. 

float OD_bin_0_975 : Sum Histogram bins #27 - 29 

/ Area of label mask. 

5 float OD_bin_0_9: Sum Histogram bins #30 - 34 / 

Area of label mask. 

float OD_bin_0__82 5: Sum Histogram bins #35 - 3 9 

/ Area of label mask. 

float 0D_ bin_0_ 75: Sum Histogram bins #40 - 45 / 
10 Area of label mask. 

float OD_b±n_0_6 75: Sum Histogram bins #46 - 53 

/ Area of label mask. 

float OD_b±n_0_6: Sum Histogram bins #54 - 62 / 

Area of label mask. 

15 float OD_bin_0_525 : Sum Histogram bins #63 - 73 

/ Area of label mask. 

float OD_bin_0_45: Sum Histogram bins #74 - 86 / 

Area of label mask. 

float OD_bin_0_375 : Sum Histogram bins #87 - 101 

2 0 / Area of label mask. 

float OD_bin_0_3: Sum Histogram bins #102 - 119 

/ Area of label mask. 

float OD_bin_0_225 : Sum Histogram bins #120 - 

142 / Area of label mask. 
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float OD_bia_0_15 : Sum Histogram bins #143 -187 

/ Area of label mask. 

float OD_bin_0_075 : Sum Histogram bins #188 - 

255 / Area of label mask, 

5 float context_3a: systemFor this feature, the 

bounding box of the nucleus is expanded by 15 pixels 
on each side. The feature is the ratio of the area 
of other segmented objects which intersect the 
enlarged box to compactness of the box, where the 
10 compactness is defined as the perimeter of the box 
squared divided by the area of the box. 

float holejpercent : The segmentation is done in 

several steps. At an intermediate step, the nuclear 
mask contains holes which are later filled in to 
15 make the mask solid. This feature is the ratio of 
the area of the holes to the total area of the 
final, solid, mask. 

float context_lb: For this feature, the bounding 

box of the nucleus is expanded by 5 pixels on each 
20 side. The feature is the ratio of the area of other 
segmented objects which intersect the enlarged box 
to the total area of the enlarged box. 

float min_di stance : The distance to the centroid 

of the nearest object from the centroid of the 
25 current object. 

The invention Results Descriptions 

This section shows all of the results of the 
invention that are written to the results structure 
TVentyXResul t, which is contained in alh_twentyx.h. . 
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int h±gh_count: Measures dark edge gradient 

content of the whole original image. This is a 
measure of how much cellular material may be in the 
image . 

5 int h±gh_mean: The average value of all pixels 

in an image that have values between 199 and 250. 
This feature provides some information about an 
image ' s background . 

int medium_threshold: lower_limit^0 - 

10 lower_limit_l where lower_limit^0 is the value of 
the 1 ow_ threshold* 3 0 , or 70, whichever is greater. 
lower_limit_l is the value of high_mean - 40, or 
150, whichever is greater. 

int low_threshold: The low threshold value is 

15 the result of an adaptive threshold calculation for 
a certain range of pixel intensities in an image 
during the segmentation process. It gives a measure 
for how much dark matter there is in an image. If 
the threshold is low, there is a fair amount of dark 
20 matter in the image. If the threshold is high, 

there are probably few high density objects in the 
image . 

float timel: Time variables which may be set 

during the invention processing. 

25 float time2: Same as timel 

float time3: Same as timel 

float time4: Same as timel 
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float sta±n_3nean_od: The cumulative value of 

mean_od for all objects identified as intermediate 
cells . 

float Btainsq_mean_od: The cumulative squared 

5 value of mean_od for all objects identified as 
intermediate cells . 

float stain_ ed_orig2 : The cumulative value of 

sd_orig2 for all objects identified as intermediate 
cells . 

10 float stainsq_sd_orig2 : The cumulative squared 

value of sd_prig2 for all objects identified as 
intermediate cells . 

float stain_xic_contrast_orig: The cumulative 

value of nc_contrastjorig for all objects identified 

r* —> *-« •» ^ e» ^ffiorl -i a fr* ^ 1 H 0 

float staineq_nc_contrast_orig: The cumulative 

squared value of nc_contrast_orig for all objects 
identified as intermediate cells. 

float stain_mean_outer_od_r3 : The cumulative 

20 value of mean_ou ter_od_r3 for all objects identified 
as intermediate cells. 

float stainsq_mean_outer_od_r3 : The cumulative 

squared value of mean_ou ter_od_r3 for all objects 
identified as intermediate cells. 

25 float stain_nuc_blur_ave: The cumulative value 

of nucjblur_ave for all objects identified as 
intermediate cells . 
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float s tainsq_nuc_blur_ave : The cumulative 

squared value of nuc_blur_ave for all objects 
identified as intermediate cells, 

float etain_edge_contrast_orig: The cumulative 

5 value of edge_contrast_orig for all objects 
identified as intermediate cells. 

float stainsq^edge_contrast_orig: The cumulative 

squared value of edge_contrast_orig for all objects 
identified as intermediate cells. 

10 int intermediate_histl [10] [6] : Histogram 

representing the features of all intermediate cells 
identified by the first classifier. 10 bins for 
IOD, and € for nuclear area. 

int intermediate^ hist2 [8] [6] : Histogram 
15 representing the features of all intermediate cells 
identified by the second classifier. 8 bins for 
IOD, and 6 for nuclear area. 

int sil__ boxl_artif act_count : Total number of 
objects in the image classified as artifacts by the 
20 Boxl classifier. 

int sil_box2_artif act_count : Total number of 

objects in the image classified as artifacts by the 
Box2 classifier. 

int sil_box3_artifact_count: Total number of 

25 objects in the image classified as artifacts by the 
first classifier of the Artifact Filter. 



BNSOOCID: <WO 9609605A 1 J_> 



WO 96/09605 



PCT/US95/11492 



- 114 - 



int eil_box4_artif act_count: Total number of 

objects in the image classified as artifacts by the 
second classifier of the Artifact Filter. 

int sil_box5_arti£ act_count : Total number of 

5 objects in the image classified as artifacts by the 
third classifier of the Artifact Filter. 

int conCompCount : The number of objects 

segmented in the image . 

int sil_stagel_noraial_countl: Total number of 

10 objects classified as normal at the end of the 
Stagel classifier. 

int eil_stagel_artifact_countl: Total number 

of objects classified as artifact at the end of the 
Stagel classifier . 

15 int sil_stagel_abnormal_countl: Total number 

of objects classified as abnormal at the end of the 
Stagel classifier. 

int sil_stage2_normal_countl: Total number of 

objects classified as normal at the end of the 
20 Stage2 94 classifier. 

int sil_stage2_artifact_countl: Total number 

of objects classified as artifact at the end of the 
Stage2 94 classifier. 

int Bil_stage2_abnormal_countl: Total number 

25 of objects classified as abnormal at the end of the 
Stage2 94 classifier. 
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int Bil_fltage3_normal_countl: Total number of 

objects classified as normal at the end of the 
stage3 96 classifier. 

int sil_etage3_ artif act_countl: Total number 

5 of objects classified as artifact at the end of the 
stage3 96 classifier. 

int sil_stage3_abnormal_countl : Total number 

of objects classified as abnormal at the end of the 
stage3 96 classifier. 

10 int ail_cluster_stage2_count : The number of 

objects classified as abnormal by the Stage2 94 
classifier which are close to abnormal objects from 
the stage3 96 classifier. 

int eil_cluster_Btagel_count : The number of 

15 objects classified as abnormal by the Stagel 

classifier which are close to abnormal objects from 
the Stage2 94 classifier. 

float sil_est_cellcount : An estimate of the 

number of squamous cells in the image. 

20 int sil_stage2_alann_IOD_histo[16] : Histogram 

representing the IOD of all objects classified as 
abnormal by the Stage2 94 classifier. 

int eil_stage2_alann_conf_hist [10] : Histogram 

representing the confidence of classification for 
25 all objects classified as abnormal by the Stage2 94 
classifier . 
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i nt B il_stage3_alann_IOD_histo [16] : Histogram 

representing the IOD of all objects classified as 
abnormal by the stage3 96 classifier. 

i nt sil_stage3_alarm_conf_hiBt [10] : Histogram 

5 representing the confidence of classification for 

all objects classified as abnormal by the stage3 96 
classifier. 

int sil_stagel_nonnal_count2: Total number of 

objects classified as normal by the Stagel Box 
10 classifier. 

int eil_Btagel_abnormal_count2 : Total number 

of objects classified as abnormal by the Stagel Box 
. classifier. 

int sil_etagel_artifact_count2: Total number 

15 of objects classified as artifact by the Stagel Box 
classifier . 

int sil_pl_etage2_norntal_count2 : Total number 

of objects classified as normal by the Stage2 94 Box 
classifier. 

20 i nt s il_pl_stage2_abnormal_count2: Total 

number of objects classified as abnormal by the 
Stage2 94 Box classifier. 

int e ii_pl_stage2_artifact_count2: Total 

number of objects classified as artifact by the 
25 Stage2 94 Box classifier. 

int sil j>l_stag 3_normal_count2 : Total number 

of objects classified as normal by the stage3 96 Box 
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classifier. 

int sil_pl_stage3_abnormal_count2 : Total 

number of objects classified as abnormal by the 
stage3 96 Box classifier. 

5 int sil_jpl_stage3_artif act_count2 : Total 

number of objects classified as artifact by the 
stage3 96 Box classifier. 

int Bil_stage4_alarm_count: Total number of 

objects classified as abnormal by the stage4 98 
10 classifier. 

int sil_stage4_prob_hist [12] : Histogram 

representing the confidence of classification for 
all objects classified as abnormal by the stage4 98 
classifier . 

15 int sil_ploidy_alann_countl : Total number of 

objects classified as abnormal by the first ploidy 
classifier 100. 

int sil_ploidy_alarxn_count2 : Total number of 

objects classified as abnormal by the second ploidy 
20 classifier 100. 

int siljploidy_prob_hiet [12] : Histogram 

representing the confidence of classification for 
all objects classified as abnormal by the ploidy 
classifier 100. 

25 int sil_S4_and_Pl_count : Total number of 

objects classified as abnormal by both the stage4 98 
and the first ploidy classifier 100. 
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£ nt ail S4_and_P2_ count : Total number of 

objects classified as abnormal by both the stage4 98 
and the second ploidy classifier 100. 

int atypical_pdf_index[8] [8]: A 2D histogram 

5 representing two confidence measures of the objects 
classified as abnormal by the Stage2 94 Box 
classifier. Refer to the description of the 
atypicality classifier in this document. 

int sil_seg_x_s2_decisive[4] : A 4 bin 

10 histogram of the product of the segmentation 

robustness value and the Stage2 94 decisiveness 
value . 

int sil_seg_x_s3_deciBive [4] : A 4 bin 

histogram of the product of the segmentation 
15 robustness value and the stage3 96 decisiveness 
value . 

int sil_B2_x_e3_decieive[4] : A 4 bin histogram 

of the product of the Stage2 94 decisiveness value 
and the stage3 96 decisiveness value. 

20 int 8il_seg_x_s2_x_s3_decisive [4] : A 4 bin 

histogram of the product of the segmentation 
robustness value, the Stage2 94 decisiveness value, 
the stage3 96 decisiveness value. 

int Bil_etage2_dec_x_eeg[4] [4] : A 4x4 array of 

25 Stage2 94 decisiveness (vertical axis) vs. 

segmentation robustness (horizontal axis) . 

int B il_stage3_dec_x_seg[4] [4]: A 4x4 array of 

stage3 96 decisiveness (vertical axis) vs. 
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segmentation robustness (horizontal axis) . 

int sil_s3_x_s2_dec_x_seg [4] [4]: A 4x4 array 

of the product of Stage2 94 and stage3 96 
decisiveness (vertical axis) vs. segmentation 
5 robustness (horizontal axis) . 

int sil_83_x_segrobuet_x_s2pc [4] [4] : A 4x4 

array of the product of segmentation robustness and 
stage3 96 decisiveness (vertical axis) vs. the 
product of Stage2 94 confidence and Stage2 94 
10 decisiveness (horizontal axis) . 

int sil_s3_x_segrobust_x_s3pc [4] [4]: A 4x4 

array of the product of segmentation robustness and 
stage3 96 decisiveness (vertical axis) vs. the 
product of stage3 96 confidence and stage3 96 
15 decisiveness (horizontal axis) . 

float sil_stage3_f tr, [NUM_FOV_ALM] , 

[LEN_FOV_FTR] : A set of 8 features for an 

object which was classified as 
abnormal by the stage3 96 

2 0 classifier. NUM_FOV_ALM refers 

to the number of the alarm as it 
was detected in the 20x scan (up 
to 50 will have features 
recorded) . LEN_FOV_FTR refers 

25 to the feature number: 0-7 

Cell Types Recognized by The invention 

The invention has been trained to recognize 
single or free lying cell types: normal, potentially 
abnormal, and artifacts that typically appear in 
30 Papanicolaou-stained cervical smears. This section 
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lists the cell types that were used to train the 
invention - 

Normal Single Cells 

single superficial squamous 
5 single intermediate squamous 
single squamous metaplastic 
single parabasal squamous 
single endocervical 
single endometrial 
10 red blood cells 

Abnormal Single Cells 

single atypical squamous 
single atypical metaplastic 
single atypical endocervical columnar 
15 single atypical endometrial 
single low grade sil 
single high grade sil 

single endocervical columnar dysplasia, well 
segmented 

20 single carcinoma in situ, endocervical columnar, 
well segmented 
single adenocarcinoma, endocervical columnar 
single adenocarcinoma, endometrial 
single adenocarcinoma, metaplastic 
25 single invasive carcinoma, small cell squamous 
single invasive carcinoma, large cell squamous 
single invasive carcinoma, keratinizing squamous 
single marked repair/reactive squamous 
single marked repair /reactive, endocervical 
3 0 single marked repair /reactive, metaplastic 
single herpes 
single histiocyte 
single lymphocyte 
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single slightly enlarged superficial squamous 
single slightly enlarged intermediate squamous 
single slightly enlarged metaplastic squamous 
single slightly enlarged parabasal squamous 
5 slightly enlarged endocervical 

Artifacts 

single air dried intermediate cell nucleus 
single air dried metaplastic/parabasal cell nucleus 
single air dried endocervical cell nucleus 
10 single questionable abnormal cell nucleus 

single over segmented intermediate cell nucleus 
single over segmented metaplastic/parabasal cell 
nucleus 

single artifact, 1 nucleus over segmented 
15 artifact, 2 nuclei 
artifact, 3+ nuclei 
single folded cytoplasm 
cytoplasm only 
bare nucleus 

2 0 unfocused 

polymorphs (white blood cells) 

graphites 

corn flaking 

mucous 

25 junk from cover slip 
other junk 

The invention has been described herein in 
considerable detail in order to comply with the 
Patent Statutes and to provide those skilled in the 

3 0 art with the information needed to apply the novel 

principles and to construct and use such specialized 
components as are required. However, it is to be 
understood that the invention can be carried out by 
specifically different equipment and devices, and 
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that various modifications, both as to the equipment 
details and operating procedures, can be 
accomplished without departing from the scope of the 
invention itself. 
5 What is claimed is: 
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CLAIMS 

1. A cell identification apparatus for identifying 
object types of interest, the apparatus 
comprising : 

5 (a) an image segment er means (10) for 

processing at least one image (11) of a 
biological specimen having a segmented 
image output ; 

(b) feature calculation means (12) for 

10 computing features having at least one 

feature output; and 

(c) means for classifying objects (14), 
connected to receive the at least one 
feature output, having a classified output 

15 where the classified output identifies 

objects (80) as being object types of 
interest . 

2 . The apparatus of claim 1 wherein the feature 
calculation means (12) comprises an object 

20 feature extractor. 

3 . The apparatus of claim 1 wherein the feature 
calculation means (12) comprises a contextual 
feature extractor. 

4 . The apparatus of claim 1 wherein the feature 
25 calculation means (12) comprises a whole image 

feature extractor . 

5. The apparatus of claim 1 wherein the objects 
(80, 82) comprise free-lying cells. 

6. The apparatus of claim 1 wherein the objects 

30 (80, 82) comprise non-nuclear overlapped cells. 
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7. The apparatus of claim 1 wherein the object 
types of interest (80 , 82) comprise normal 
cells, abnormal cells or artifacts. 

8 . The apparatus of claim 7 wherein the normal 
5 cells comprise reference intermediate cells 

(142) . 

9. The apparatus of claim 7 wherein the abnormal 
cells comprise cancerous and precancerous 
cells . 

10 10. The apparatus of claim 1 wherein the biological 
specimen is a specimen prepared by the 
Papanicolaou method. 

11. The apparatus of claim 1 wherein the biological 
specimen is a gynecological specimen. 

15 12. The apparatus of claim 1 further comprising a 
means for accumulating the classified output 
(18) . 



13 . The apparatus of claim 1 comprising a means for 
measuring a stain (92) of at least one type of 

20 object (142, 144, 146, 148). 

14. The apparatus of claim 13 wherein the at least 
one type of object (8 0) comprises reference 
intermediate cells (142) . 

15. The apparatus of claim 1 further comprising a 
25 means for measuring a classification confidence 

(216) for a set of objects (80, 82) classified 
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as being object types of interest (80, 82) . 

16. The apparatus of claim 1 further comprising a 
means for measuring a reliability of object 
segmentation (24) . 

5 17. The apparatus of claim 1 further comprising a 
means for measuring repeatability of 
classification results (Figure 7B) . 

18. A free-lying cell segmenter (10) comprising: 

(a) a means for acquiring at least one image 
10 (28) of a biological specimen having an 

image output (2 9) ; 

(b) a means for creating a contrast enhanced 
image (30) having an enhanced image output 
(31) wherein the means for creating a 

15 contrast enhanced image (30) is connected 

to receive the at least one image (29) ; 

(c) a means for image thresholding (32) having 
an image threshold output (33) wherein the 
means for image thresholding (32) is 

20 connected to receive the contrast enhanced 

image (31) ; and 

(d) a means for object refinement (34) having 
a refined object output wherein the means 
for object refinement (34) is connected to 

2 5 receive the thresholded image output (33) . 

19. A feature classifier for performing a plurality 
of stages of feature extraction (12) and object 
classification (14) on cells in a biological 
specimen comprising : 

30 (a) means for acquiring at least one image 

(28) of a biological specimen; 
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10 



(b) an initial stage classifier means (90) for 
determining whether objects (80, 82) in 
the at least one image are object types of 
interest and other objects; and 

(c) a sequence of object classifiers (92, 94, 
96, 98, 100) wherein each object 
classifier has an object type of interest 
input, an object type of interest output 
and an other object type output, and 
wherein the object type of interest output 
is connected to the object type of 
interest input of a next classifier (92, 
94, 96, 98, 100) in the sequence. 

20. The apparatus of claim 19 further comprising: 
15 (a) an initial box filter means (90) for 

determining whether objects (80, 82) are 
normal, potentially abnormal or artifacts; 

(b) a stage 1 classifier means (92) for 
processing the normal and potentially 

20 abnormal objects into a potentially 

abnormal, artifact or normal object; 

(c) a stage 2 classifier means (94) for 
determining whether the potentially 
abnormal objects from the stage 1 

25 classifier (92) are potentially abnormal, 

artifact or normal; 

(d) a stage 3 classifier (96) for determining 
whether the potentially abnormal objects 
from the stage 2 classifier (94) are 

30 potentially abnormal or are normal and 

artifact objects; 

(e) a stage 4 classifier (98) for determining 
whether the potential abnormal objects 
from the stage 3 classifier (96) are 
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potentially abnormal or normal artifacts. 

21. The apparatus of claim 19 further comprising a 
diagnostic classifier means (100) for 
determining whether the objects of interest 
5 (80, 82) from a final classifier (96) in the 

sequence of classifiers are low grade squamous 
intraepithelial lesions, potential high grade 
squamous intraepithelial lesions, cancerous 
lesions and normal artifacts. 

10 22. The apparatus of claim 19 wherein the object 
types of interest (80, 82) comprise normal 
cells (142), abnormal cells and artifacts. 

23 . The apparatus of claim 22 wherein the normal 
cells (142) comprise reference intermediate 

15 cells. 

24 . The apparatus of claim 22 wherein the abnormal 
cells comprise cancerous and precancerous 
cells . 

25. The apparatus of claim 19 wherein the 

20 biological specimen is a specimen prepared by 

the Papanicolaou method. 

26. The apparatus of claim 19 wherein the 
biological specimen is a gynecological 
specimen. 



25 



27. The apparatus of claim 19 further comprising a 
means for computing (94) an atypicality index 
(22) . 
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28. The apparatus of claim 20 wherein the initial 
box filter (90) further comprises a filter 
selected from the group consisting of a dark 
object filter (104) , an unfocused object filter 

5 (106) , a polymorphonuclear leukocytes filter, a 

graphite filter (108) , and a cytoplasm filter 
(110) . 

29. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 

10 classifiers (90, 92, 94, 96, 98, 100) comprises 

a box filter (90) . 

30. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 

15 a decision tree classifier (Figure 7B) . 

31. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 
a binary decision tree classifier (Figure 7B) . 

20 32. The apparatus of claim 19 wherein at least one 
of the classifiers in the sequence of object 
classifiers (90, 92, 94, 96, 98, 100) comprises 
a fuzzy classifier. 

33 . The apparatus of claim 19 wherein at least one 
25 of the classifiers in the sequence of object 

classifiers (90, 92, 94, 96, 98, 100) comprises 
a non-parametric classifier. 

34 . The apparatus of claim 19 wherein at least one 
of the classifiers (Figure 8) in the sequence 
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of object classifiers (90, 92, 94, 96, 98, 100) 
further comprises means for measuring 
confidence (216) . 

35. The apparatus of claim 20 wherein the stage 4 
5 classifier (98) comprises: 

(a) a feature combination classifier (202) for 
classifying objects as normal or abnormal; 

(b) a means for computing a probability (210) 
of abnormal objects being abnormal; 

10 (c) a means for combining (206) a second set 

of features to determine whether the 
object is classified as normal or 
abnormal ; 

(d) a means for computing a probability (214) 
15 of the object being abnormal; and 

(e) a means for combining (216) the first 
probability (210) and the second 
probability (214) to produce a final 
confidence factor. 

20 36. The apparatus of claim 21 wherein the 
diagnostic classifier, being a ploidy 
classifier, further comprises: 

(a) means for computing a probability that the 
object is abnormal (224) ; 
25 (b) means for computing whether the object is 

classified as aneuploid (230) ; 

(c) means for computing a probability that the 
object is aneuploid (232) ; and 

(d) means for combining the first probability 
3 0 and the second probability to provide a 

final confidence (234) . 

37. The apparatus of claim 19 further including a 
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plurality of computer processors (540) wherein 
the plurality of computer processors (540) 
perform multilayered processing. 

5 38. An apparatus for computing a stain score from a 
biological specimen comprising: 

(a) means for acquiring at least one image 
(28) of a biological specimen; 

(b) means for classifying objects (14) that 
10 are object types of interest (142, 144, 

146, 148) in the at least one image (28), 
wherein the means for classifying objects 
(14) provides a classified object output; 

(c) means for measuring stain feature values 
15 (92) from the objects of interest (142, 

144, 146, 148), connected to the 
classified object output, wherein the 
means for measuring stain feature values 
(92) has a stain feature value output; and 
20 (d) means for accumulating stain feature 

values (18) connected to the stain feature 
value output, and wherein the means for 
accumulating stain feature values (18) 
generates a stain score output (21) . 



25 



30 



39. The apparatus of claim 38 wherein the stain 
feature values (21) comprise a density of an 
object of interest (142, 144, 146, 148). 

40. The apparatus of claim 38 wherein the stain 
feature values (21) comprise texture of the 
object of interest (142, 144, 146, 148). 

41. The apparatus of claim 38 wherein the stain 
feature (21) comprises a difference in at least 
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one feature of the objects of interest (142, 
144, 146, 148) and at least one feature 
measurement of the background of the objects of 
interest . 

5 

42. An apparatus for measuring the repeatability of 
classification for a biological specimen 
comprising: 

(a) means for acquiring at least one image 
10 (10) of a biological specimen; 

(b) means, connected to receive the at least 
one image, for computing object features 
(12) having an object features output; 

(c) means for classifying objects (14) 

15 connected to the object features output, 

wherein the means for classifying objects 
provides a classified object output; 

(d) means for estimating a classification 
repeatability (Figure 7B) of object types, 

20 connected to the classified object output 

and object features output, wherein the 
means for estimating (Figure 7B) has a 
classification repeatability output. 

43. The apparatus of claim 42 wherein the means for 
25 estimating the classification repeatability 

(Figure 7B) further comprising feature distance 
measuring means for computing a distance from a 
feature value to a classification boundary 

(Figure 6B) of the objects of interest. 

30 44 . An apparatus for measuring the reliability for 
object segmentation of a biological specimen 
comprising : 

(a) means for acquiring at least one image 
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(28) of a biological specimen having an 
image output (29) ; 

(b) means for image segmentation (10) 
connected to the image output (29) to 
detect objects of interest (80, 82), 
wherein the means for image segmentation 
(10) has a segmented object output; 

(c) means for feature extraction (12) 
connected to the segmented object output, 
wherein the means for feature extraction 
(12) has a segmentation reliability 
feature output (24) ; 

(d) means for classification of objects (14) 
connected to the segmentation reliability 
feature output (24) having a classified 
output (216) , where the classified output 
(216) comprises a measure of the 
reliability of the segmented object 
output . 

A feature classification process for performing 
a plurality of stages of feature extraction and 
object classification on cells in a biological 
specimen comprising : 

(a) an initial box filter means (90) for 
determining whether objects (80, 82) are 
normal and potentially abnormal or 
artifacts; 

(b) a stage 1 classifier means (92) for 
processing the normal and potentially 
abnormal objects into a potentially 
abnormal, artifact or normal object; 

(c) a stage 2 classifier means (94) for 
determining whether the potentially 
abnormal objects from the stage 1 
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classifier (92) are potentially abnormal, 
artifact or normal ; 

(d) a stage 3 classifier (96) for determining 
whether the potentially abnormal objects 

5 from the stage 2 classifier (94) are 

potentially abnormal or are normal and 
artifact objects; and 

(e) a stage 4 classifier (98) for determining 
whether the potential abnormal objects 

10 from the stage 3 classifier (96) are 

potentially abnormal or are normal 
artifacts . 

46. The apparatus of claim 27 further comprising a 
diagnostic classifier means (100) for 

15 determining whether the objects of interest 

(80, 82) in the output of the stage 3 
classifier (96) are low grade squamous 
intraepithelial lesions, potential high grade 
squamous intraepithelial lesions, cancerous 

20 lesions or normal artifacts. 
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